Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the Polaris CLI #30

Merged
merged 6 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,083 changes: 1,083 additions & 0 deletions docs/command-line-interface.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/entities.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For details on how to use Storage Types in the REST API, see [the API docs](../r

A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_.

In Polaris, namespaces can be nested up to 16 levels. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on.
In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on.

For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs](../regtests/client/python/docs/CreateNamespaceRequest.md).

Expand Down
30 changes: 13 additions & 17 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

# Quick Start

This guide serves as a introduction to several key entities that can be managed with Polaris, describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Spark and Trino.
This guide serves as a introduction to several key entities that can be managed with Polaris, describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark.

## Prerequisites

Expand All @@ -39,23 +39,19 @@ git clone https://github.com/polaris-catalog/polaris.git

#### With Docker

If you plan to deploy Polaris inside [Docker](https://www.docker.com/)], you'll need to install docker itself. For can be done using [homebrew](https://brew.sh/):
If you plan to deploy Polaris inside [Docker](https://www.docker.com/), you'll need to install docker itself. For example, this can be done using [homebrew](https://brew.sh/):

```
brew install docker
brew install --cask docker
```

Once installed, make sure Docker is running. This can be done on macOS with:

```
open -a Docker
```
Once installed, make sure Docker is running.

#### From Source

If you plan to build Polaris from source yourself, you will need to satisfy a few prerequisites first.

Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebre]w(https://brew.sh/) and configure it with jenv:
Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv:

```
cd ~/polaris
Expand All @@ -77,13 +73,13 @@ If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/)
brew install git
```

Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5.0](https://spark.apache.org/releases/spark-release-3-5-0.html).
Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html).

```
cd ~
git clone https://github.com/apache/spark.git
cd ~/spark
git checkout branch-3.5.0
git checkout branch-3.5
```

## Deploying Polaris
Expand Down Expand Up @@ -128,7 +124,7 @@ For this tutorial, we'll launch an instance of Polaris that stores entities only
When Polaris is launched using in-memory mode the root `CLIENT_ID` and `CLIENT_SECRET` can be found in stdout on initial startup. For example:

```
Bootstrapped with credentials: {"client-id": "XXXX", "client-secret": "YYYY"}
realm: default-realm root principal credentials: XXXX:YYYY
```

Be sure to note of these credentials as we'll be using them below.
Expand Down Expand Up @@ -230,10 +226,10 @@ In order to give this principal the ability to interact with the catalog, we mus
--client-id ${CLIENT_ID} \
--client-secret ${CLIENT_SECRET} \
privileges \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
catalog \
grant \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
CATALOG_MANAGE_CONTENT
```

Expand All @@ -251,7 +247,7 @@ At this point, we’ve created a principal and granted it the ability to manage

To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API.

This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). With a local Spark clone, we on the `branch-3.5` branch we can run the following:
This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following:

_Note: the credentials provided here are those for our principal, not the root credentials._

Expand Down Expand Up @@ -311,10 +307,10 @@ If at any time access is revoked...
--client-id ${CLIENT_ID} \
--client-secret ${CLIENT_SECRET} \
privileges \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
catalog \
revoke \
--catalog quickstart_catalog \
--catalog-role quickstart_catalog_role \
CATALOG_MANAGE_CONTENT
```

Expand Down
6 changes: 6 additions & 0 deletions polaris
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,11 @@ fi

pushd $SCRIPT_DIR > /dev/null
PYTHONPATH=regtests/client/python ${SCRIPT_DIR}/polaris-venv/bin/python3 regtests/client/python/cli/polaris_cli.py "$@"
status=$?
popd > /dev/null

if [ $status -ne 0 ]; then
exit 1
fi

exit 0
5 changes: 5 additions & 0 deletions regtests/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,12 @@ WORKDIR /home/spark/regtests
COPY ./setup.sh /home/spark/regtests/setup.sh
COPY ./pyspark-setup.sh /home/spark/regtests/pyspark-setup.sh
COPY ./client/python /home/spark/regtests/client/python
COPY ./polaris /home/spark

RUN python3 -m venv /home/spark/polaris-venv && \
. /home/spark/polaris-venv/bin/activate && \
pip install poetry==1.5.0 && \
deactivate \
RUN ./setup.sh

COPY --chown=spark . /home/spark/regtests
Expand Down
13 changes: 12 additions & 1 deletion regtests/client/python/cli/command/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,23 @@ def options_get(key, f=lambda x: x):
action=options_get(f'{subcommand}_subcommand'),
catalog_name=options_get(Arguments.CATALOG),
catalog_role_name=options_get(Arguments.CATALOG_ROLE),
namespace=options_get(Arguments.NAMESPACE, lambda s: s.split('.')),
namespace=options_get(Arguments.NAMESPACE, lambda s: s.split('.') if s else None),
view=options_get(Arguments.VIEW),
table=options_get(Arguments.TABLE),
privilege=options_get(Arguments.PRIVILEGE),
cascade=options_get(Arguments.CASCADE)
)
elif options.command == Commands.NAMESPACES:
from cli.command.namespaces import NamespacesCommand
subcommand = options_get(f'{Commands.NAMESPACES}_subcommand')
command = NamespacesCommand(
subcommand,
catalog=options_get(Arguments.CATALOG),
namespace=options_get(Arguments.NAMESPACE, lambda s: s.split('.')),
parent=options_get(Arguments.PARENT, lambda s: s.split('.') if s else None),
location=options_get(Arguments.LOCATION),
properties=properties
)

if command is not None:
command.validate()
Expand Down
9 changes: 5 additions & 4 deletions regtests/client/python/cli/command/catalog_roles.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
from pydantic import StrictStr

from cli.command import Command
from cli.constants import Subcommands
from cli.constants import Subcommands, Arguments
from cli.options.option_tree import Argument
from polaris.management import PolarisDefaultApi, CreateCatalogRoleRequest, CatalogRole, UpdateCatalogRoleRequest, \
GrantCatalogRoleRequest

Expand All @@ -45,10 +46,10 @@ class CatalogRolesCommand(Command):

def validate(self):
if not self.catalog_name:
raise Exception("Missing required argument: --catalog")
raise Exception(f'Missing required argument: {Argument.to_flag_name(Arguments.CATALOG)}')
if self.catalog_roles_subcommand in {Subcommands.GRANT, Subcommands.REVOKE}:
if not self.principal_role_name:
raise Exception("Missing required argument: --principal")
raise Exception(f'Missing required argument: {Argument.to_flag_name(Arguments.PRINCIPAL_ROLE)}')

def execute(self, api: PolarisDefaultApi) -> None:
if self.catalog_roles_subcommand == Subcommands.CREATE:
Expand Down Expand Up @@ -90,4 +91,4 @@ def execute(self, api: PolarisDefaultApi) -> None:
api.revoke_catalog_role_from_principal_role(
self.principal_role_name, self.catalog_name, self.catalog_role_name)
else:
raise Exception(f"{self.catalog_roles_subcommand} is not supported in the CLI")
raise Exception(f'{self.catalog_roles_subcommand} is not supported in the CLI')
67 changes: 40 additions & 27 deletions regtests/client/python/cli/command/catalogs.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
from pydantic import StrictStr

from cli.command import Command
from cli.constants import StorageType, CatalogType, Subcommands
from cli.constants import StorageType, CatalogType, Subcommands, Arguments
from cli.options.option_tree import Argument
from polaris.management import PolarisDefaultApi, Catalog, CreateCatalogRequest, UpdateCatalogRequest, \
StorageConfigInfo, ExternalCatalog, AwsStorageConfigInfo, AzureStorageConfigInfo, GcpStorageConfigInfo, \
PolarisCatalog, CatalogProperties
Expand Down Expand Up @@ -57,35 +58,42 @@ class CatalogsCommand(Command):
def validate(self):
if self.catalogs_subcommand == Subcommands.CREATE:
if not self.storage_type:
raise Exception(f"Missing required argument:"
f" --storage-type")
raise Exception(f'Missing required argument:'
f' {Argument.to_flag_name(Arguments.STORAGE_TYPE)}')
if not self.default_base_location:
raise Exception(f"Missing required argument:"
f" --default-base-location")
if self.catalog_type == CatalogType.EXTERNAL.value:
if not self.remote_url:
raise Exception(f"Missing required argument for {CatalogType.EXTERNAL.value} catalog:"
f" --remote-url")
raise Exception(f'Missing required argument:'
f' {Argument.to_flag_name(Arguments.DEFAULT_BASE_LOCATION)}')
if self.catalogs_subcommand == Subcommands.UPDATE:
if self.allowed_locations:
if not self.storage_type:
raise Exception(f"Missing required argument when updating allowed locations for a catalog:"
f" --storage-type")
raise Exception(f'Missing required argument when updating allowed locations for a catalog:'
f' {Argument.to_flag_name(Arguments.STORAGE_TYPE)}')

if self.storage_type == StorageType.S3.value:
if not self.role_arn:
raise Exception("Missing required argument for storage type 's3': --role-arn")
raise Exception(f"Missing required argument for storage type 's3':"
f" {Argument.to_flag_name(Arguments.ROLE_ARN)}")
if self._has_azure_storage_info() or self._has_gcs_storage_info():
raise Exception("Storage type 's3' supports the storage configurations --role-arn, "
"--external-id, and --user-arn")
raise Exception(f"Storage type 's3' supports the storage credentials"
f" {Argument.to_flag_name(Arguments.ROLE_ARN)},"
f" {Argument.to_flag_name(Arguments.EXTERNAL_ID)}, and"
f" {Argument.to_flag_name(Arguments.USER_ARN)}")
elif self.storage_type == StorageType.AZURE.value:
if not self.tenant_id:
raise Exception("Missing required argument for storage type 'azure': --tenant-id")
raise Exception("Missing required argument for storage type 'azure': "
f" {Argument.to_flag_name(Arguments.TENANT_ID)}")
if self._has_aws_storage_info() or self._has_gcs_storage_info():
raise Exception("Storage type 'azure' supports the storage configurations --tenant-id, "
"--multi-tenant-app-name, and --consent-url")
elif self._has_aws_storage_info() or self._has_azure_storage_info():
raise Exception("Storage type 'gcs' supports the storage configuration: --service-account")
raise Exception("Storage type 'azure' supports the storage credentials"
f" {Argument.to_flag_name(Arguments.TENANT_ID)},"
f" {Argument.to_flag_name(Arguments.MULTI_TENANT_APP_NAME)}, and"
f" {Argument.to_flag_name(Arguments.CONSENT_URL)}")
elif self.storage_type == StorageType.GCS.value:
if self._has_aws_storage_info() or self._has_azure_storage_info():
raise Exception("Storage type 'gcs' supports the storage credential"
f" {Argument.to_flag_name(Arguments.SERVICE_ACCOUNT)}")
elif self.storage_type == StorageType.FILE.value:
if self._has_aws_storage_info() or self._has_azure_storage_info() or self._has_gcs_storage_info():
raise Exception("Storage type 'file' does not support any storage credentials")

def _has_aws_storage_info(self):
return self.role_arn or self.external_id or self.user_arn
Expand Down Expand Up @@ -121,6 +129,11 @@ def _build_storage_config_info(self):
tenant_id=self.tenant_id,
multi_tenant_app_name=self.multi_tenant_app_name
)
elif self.storage_type == StorageType.FILE.value:
config = StorageConfigInfo(
storage_type=self.storage_type.upper(),
allowed_locations=self.allowed_locations
)
return config

def execute(self, api: PolarisDefaultApi) -> None:
Expand Down Expand Up @@ -161,17 +174,17 @@ def execute(self, api: PolarisDefaultApi) -> None:
print(catalog.to_json())
elif self.catalogs_subcommand == Subcommands.UPDATE:
catalog = api.get_catalog(self.catalog_name)
default_base_location_properties = {}
if self.default_base_location:
default_base_location_properties = {'default-base-location': self.default_base_location}
catalog.properties = {**default_base_location_properties, **self.properties}

if self.default_base_location or self.properties:
catalog.properties = CatalogProperties(
default_base_location=self.default_base_location,
additional_properties=self.properties
)
request = UpdateCatalogRequest(
current_entity_version=catalog.entity_version,
catalog=catalog
)
if (self.allowed_locations or self._has_aws_storage_info() or self._has_azure_storage_info() or
self._has_gcs_storage_info()):
if (self._has_aws_storage_info() or self._has_azure_storage_info() or self._has_gcs_storage_info() or
self.allowed_locations or self.default_base_location):
request = UpdateCatalogRequest(
current_entity_version=catalog.entity_version,
catalog=catalog,
Expand All @@ -180,5 +193,5 @@ def execute(self, api: PolarisDefaultApi) -> None:

api.update_catalog(self.catalog_name, request)
else:
raise Exception(f"{self.catalogs_subcommand} is not supported in the CLI")
raise Exception(f'{self.catalogs_subcommand} is not supported in the CLI')

94 changes: 94 additions & 0 deletions regtests/client/python/cli/command/namespaces.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#
# Copyright (c) 2024 Snowflake Computing Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import json
import re
from dataclasses import dataclass
from typing import Dict, Optional, List

from pydantic import StrictStr

from cli.command import Command
from cli.constants import Subcommands, Arguments, UNIT_SEPARATOR
from cli.options.option_tree import Argument
from polaris.catalog import IcebergCatalogAPI, CreateNamespaceRequest, ApiClient, Configuration
from polaris.catalog.exceptions import NotFoundException
from polaris.management import PolarisDefaultApi


@dataclass
class NamespacesCommand(Command):
"""
A Command implementation to represent `polaris namespaces`. The instance attributes correspond to parameters
that can be provided to various subcommands

Example commands:
* ./polaris namespaces create --catalog my_schema my_namespace
* ./polaris namespaces list --catalog my_catalog
* ./polaris namespaces delete --catalog my_catalog my_namespace.inner
"""

namespaces_subcommand: str
catalog: str
namespace: List[StrictStr]
parent: List[StrictStr]
location: str
properties: Optional[Dict[str, StrictStr]]

def validate(self):
if not self.catalog:
raise Exception(f'Missing required argument:'
f' {Argument.to_flag_name(Arguments.CATALOG)}')

def _get_catalog_api(self, api: PolarisDefaultApi):
"""
Convert a management API to a catalog API
"""
catalog_host = re.match(r'(http://[^/]+)', api.api_client.configuration.host).group(1)
configuration = Configuration(
host=f'{catalog_host}/api/catalog',
username=api.api_client.configuration.username,
password=api.api_client.configuration.password,
access_token=api.api_client.configuration.access_token,
)
return IcebergCatalogAPI(ApiClient(configuration))

def execute(self, api: PolarisDefaultApi) -> None:
catalog_api = self._get_catalog_api(api)
if self.namespaces_subcommand == Subcommands.CREATE:
properties = self.properties or {}
if self.location:
properties = {**properties, Arguments.LOCATION: self.location}
request = CreateNamespaceRequest(
namespace=self.namespace,
properties=self.properties
)
catalog_api.create_namespace(
prefix=self.catalog,
create_namespace_request=request)
elif self.namespaces_subcommand == Subcommands.LIST:
if self.parent is not None:
result = catalog_api.list_namespaces(prefix=self.catalog, parent=UNIT_SEPARATOR.join(self.parent))
else:
result = catalog_api.list_namespaces(prefix=self.catalog)
for namespace in result.namespaces:
print(json.dumps({"namespace": '.'.join(namespace)}))
elif self.namespaces_subcommand == Subcommands.DELETE:
catalog_api.drop_namespace(prefix=self.catalog, namespace=UNIT_SEPARATOR.join(self.namespace))
elif self.namespaces_subcommand == Subcommands.GET:
catalog_api.namespace_exists(prefix=self.catalog, namespace=UNIT_SEPARATOR.join(self.namespace))
print(json.dumps({"namespace": '.'.join(self.namespace)}))
else:
raise Exception(f"{self.namespaces_subcommand} is not supported in the CLI")
Loading
Loading