This guide demonstrates how to integrate Apache Polaris (Incubating) CLI with Snowflake Open Catalog, Snowflake's managed implementation of Apache Polaris.
Apache Polaris is an open-source catalog for Apache Iceberg that provides multi-engine interoperability, while Snowflake Open Catalog offers a fully managed version that simplifies deployment and operations. By using the Polaris CLI with Open Catalog, you can programmatically manage catalogs, principals, and access controls in your data lakehouse architecture.
This tutorial walks through setting up the complete integration, including AWS IAM configuration, catalog creation, and user management.
This tutorial needs a Open Catalog account with user who has the role POLARIS_ACCOUNT_ADMIN to create catalogs and manage principals.
Ensure you have completed setting up Snowflake CLI and Open Catalog to auth with Key Pair. If not done, please follow the steps in the Open Catalog KeyPair.
gh repo clone https://github.com/apache/polarisRun the following commands to build and run Polaris CLI in current directory:
./polaris --helpCreate a .env file in the current directory with the following content:
AWS_ACCESS_KEY_ID='your-access-key-id'
AWS_SECRET_ACCESS_KEY='your-secret-access-key'
AWS_SESSION_TOKEN='your-session-token' # Optional, if using temporary credentials
AWS_REGION=us-west-2
WORK_DIR="${PWD}/work"
PATH_TO_POLARIS_CLI="$PWD/polaris:$PATH"
OC_API_URL="https://your-account.snowflakecomputing.com"
SNOWFLAKE_DEFAULT_CONNECTION_NAME="opencatalog-key"
SNOWFLAKE_ACCOUNT_ID="your-account-id"
PRIVATE_KEY_PASSPHRASE='your-private-key-passphrase'
OC_STORAGE_BUCKET_NAME="${USER}-devrel-oc-demo-polardb"
OC_STORAGE_AWS_ROLE_NAME="${USER}-oc-s3-role"
OC_STORAGE_AWS_ROLE_POLICY_NAME="${USER}-oc-s3-role-policy"
OC_CATALOG_NAME="polardb"
OC_ADMIN_USER_NAME="super_user"Create the .work directory with right permissions:
mkdir -p "${WORK_DIR}"
chmod 700 -R "${WORK_DIR}"Load the environment variables:
source .envTip
Using direnv can help manage environment variables automatically when you enter the directory.
Be sure to hook direnv into your shell by adding the following line to your shell configuration file (e.g., .bashrc, .zshrc) using the guide.
To authenticate with Open Catalog, you need to generate an access token using the Snowflake CLI. This token will be used in subsequent API calls.
To generate the access token, you can use the following command. This command generates a JWT token and then uses it to request an access token from the Open Catalog API.
export JWT_TOKEN=$(snow connection generate-jwt)Then, use the generated JWT token to get the access token using the Open Catalog API:
export ACCESS_TOKEN=$(http --form POST "${OC_API_URL}/polaris/api/catalog/v1/oauth/tokens" \
Accept:application/json \
scope="session:role:POLARIS_ACCOUNT_ADMIN" \
grant_type="client_credentials" \
client_secret="$JWT_TOKEN" | jq -r ".access_token")Important
Whenever you get Unauthorized error, you need to regenerate the JWT token and access token.
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query 'Account' --output text)"
export OC_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${OC_STORAGE_AWS_ROLE_NAME}"polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalogs list polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalogs create "${OC_CATALOG_NAME:-polardb}" \
--type="INTERNAL" \
--storage-type="S3" \
--role-arn="${OC_ROLE_ARN}" \
--external-id="${AWS_EXTERNAL_ID}" \
--region="${AWS_REGION:-us-west-2}" \
--default-base-location="s3://${OC_STORAGE_BUCKET_NAME}"Get Catalog Information for use in the next steps:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalogs list | jq --arg catalog_name "${OC_CATALOG_NAME:-polardb}" '. | select(.name==$catalog_name)' > "${WORK_DIR}/catalog-info.json"Verify catalog information:
jq . "${WORK_DIR}/catalog-info.json"Create the necessary AWS resources to support the Open Catalog integration, including an S3 bucket for storage and an IAM role with appropriate policies.
Note
This step is optional if you already have an S3 bucket and IAM role set up for Open Catalog.
Create an S3 bucket to store the catalog data. You can use any S3-compatible storage service.
aws s3api create-bucket --bucket "${OC_STORAGE_BUCKET_NAME}" \
--region "${AWS_REGION:-us-west-2}" \
--create-bucket-configuration LocationConstraint="${AWS_REGION:-us-west-2}"First, generate a unique external ID to use in the trust policy for the IAM role. This helps prevent the confused deputy problem by ensuring that only Snowflake can assume the role.
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query 'Account' --output text)"
export OC_AWS_USER_ARN=$(jq -r '.storageConfigInfo.userArn' "${WORK_DIR}/catalog-info.json")
export OC_AWS_EXTERNAL_ID=$(jq -r '.storageConfigInfo.externalId' "${WORK_DIR}/catalog-info.json")Note
We will update the trust policy later to allow Open Catalog to assume the role. We also add the root user for the AWS account to allow testing the setup from local machine which has access to the AWS account.
Create the IAM role with trust policy:
cat > "${WORK_DIR}/trust-policy.json" <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"AWS": [
"arn:aws:iam::${AWS_ACCOUNT_ID}:root",
"${OC_AWS_USER_ARN}"
]
},
"Condition": {
"StringEquals": {
"sts:ExternalId": "${OC_AWS_EXTERNAL_ID}"
}
}
}
]
}
EOFVerify the trust policy:
jq . "${WORK_DIR}/trust-policy.json"Create the IAM role with the trust policy:
aws iam create-role \
--role-name "${OC_STORAGE_AWS_ROLE_NAME}" \
--assume-role-policy-document "file://${WORK_DIR}/trust-policy.json"Create the access policy, it defines two statements one for S3 object actions and another for bucket-level actions. This policy allows the role to perform necessary operations on the specified S3 bucket.
cat > "${WORK_DIR}/s3-access-policy.json" <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::${OC_STORAGE_BUCKET_NAME}/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::${OC_STORAGE_BUCKET_NAME}",
"Condition": {
"StringLike": {
"s3:prefix": [
"*"
]
}
}
}
]
}
EOFVerify the access policy:
jq . "${WORK_DIR}/s3-access-policy.json"Create the policy in AWS:
aws iam create-policy \
--policy-name "${OC_STORAGE_AWS_ROLE_POLICY_NAME}" \
--policy-document "file://${WORK_DIR}/s3-access-policy.json"Finally Attach the policy to the role:
aws iam attach-role-policy \
--role-name "${OC_STORAGE_AWS_ROLE_NAME}" \
--policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${OC_STORAGE_AWS_ROLE_POLICY_NAME}"List existing principals in the Polaris catalog:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
principals listCreate a principal named ${OC_ADMIN_USER_NAME}:
Note
This command creates a new principal in the Polaris catalog, which represents a user or service that can interact with the catalog. The response will include the principal's credentials (clientId and clientSecret), which can be saved for later use.
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
principals create "${OC_ADMIN_USER_NAME}" | jq -r . > "${WORK_DIR}/principal.json"Create a Principal role named "${OC_CATALOG_NAME}_admin":
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
principal-roles create "${OC_CATALOG_NAME}_admin"Now grant that Principal role ${OC_CATALOG_NAME}_admin to the Principal ${OC_ADMIN_USER_NAME}:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
principal-roles grant \
--principal "${OC_ADMIN_USER_NAME}" \
"${OC_CATALOG_NAME}_admin"Create a catalog role named ${OC_CATALOG_NAME}_catalog_admin:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalog-roles create \
--catalog "${OC_CATALOG_NAME:-polardb}" \
"${OC_CATALOG_NAME}_catalog_admin"List Catalog Roles in the catalog ${OC_CATALOG_NAME:-polardb}:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalog-roles list "${OC_CATALOG_NAME:-polardb}"Grant the catalog role ${OC_CATALOG_NAME}_catalog_admin to the Principal Role ${OC_CATALOG_NAME}_admin:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalog-roles grant \
--catalog "${OC_CATALOG_NAME:-polardb}" \
--principal-role "${OC_CATALOG_NAME}_admin" \
"${OC_CATALOG_NAME}_catalog_admin"List Catalog Roles in the catalog ${OC_CATALOG_NAME:-polardb} that is assigned to the Principal role ${OC_CATALOG_NAME}_admin:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalog-roles list \
--principal-role "${OC_CATALOG_NAME}_admin" "${OC_CATALOG_NAME:-polardb}"Grant the privilege CATALOG_MANAGE_CONTENT to the catalog role ${OC_CATALOG_NAME}_catalog_admin on the catalog ${OC_CATALOG_NAME:-polardb}
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
privileges catalog grant \
--catalog "${OC_CATALOG_NAME:-polardb}" \
--catalog-role "${OC_CATALOG_NAME}_catalog_admin" \
CATALOG_MANAGE_CONTENTAdd another role, TABLE_LIST, to the catalog role ${OC_CATALOG_NAME}_catalog_admin on the catalog ${OC_CATALOG_NAME:-polardb}. This role allows listing tables in the catalog.
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
privileges catalog grant \
--catalog "${OC_CATALOG_NAME:-polardb}" \
--catalog-role "${OC_CATALOG_NAME}_catalog_admin" \
TABLE_LISTList Catalog Privileges on a Catalog Role,
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
privileges list \
--catalog "${OC_CATALOG_NAME:-polardb}" \
--catalog-role "${OC_CATALOG_NAME}_catalog_admin"To verify the setup, let us generate a notebook
python generate_notebook.pyOpen the generated notebook in your Jupyter environment.
To integrate Snowflake with Open Catalog, you can use the Snowflake CLI to create a connection to the Open Catalog. This allows you to query and manage the Apache Iceberg tables directly from Snowflake.
Important
You would have set the PRIVATE_KEY_PASSPHRASE in the .env file, which is used to authenticate with Snowflake Open Catalog. Unset and set the right one if you are going to use a different passphrase and key based authentication.
Verify if you are able to connect to your Snowflake account:
snow connection test -c "${SNOWFLAKE_CONNECTION_NAME}"Set the database where you want to create the Iceberg tables to $SNOWFLAKE_DATABASE:
e.g.
export SNOWFLAKE_DATABASE="kamesh_demos"Extract client ID, client secret, and principal name from the principal JSON file created earlier:
export CLIENT_ID=$(jq -r '.clientId' "${WORK_DIR}/principal.json")
export CLIENT_SECRET=$(jq -r '.clientSecret' "${WORK_DIR}/principal.json")snow sql -c "${SNOWFLAKE_CONNECTION_NAME}" \
--variable="database_name=${SNOWFLAKE_DATABASE}" \
--variable="schema_name=iceberg" \
--variable="catalog_name=${OC_CATALOG_NAME:-polardb}" \
--variable="catalog_uri=${OC_API_URL}/polaris/api/catalog" \
--variable="client_id=${CLIENT_ID}" \
--variable="client_secret=${CLIENT_SECRET}" \
--filename "$PWD/scripts/snowflake_integration.sql"Let us query the iceberg table created in the previous step:
snow sql \
-c "${SNOWFLAKE_CONNECTION_NAME}" \
-q "select * from kamesh_demos.iceberg.sflabs_oc_pol_demo_fruits"snow sql \
-c "${SNOWFLAKE_CONNECTION_NAME}" \
-q "select * from kamesh_demos.iceberg.sflabs_oc_pol_demo_penguins limit 10"To clean up the resources created during this tutorial, you can run the following commands:
Cleanup Open Catalog resources:
- Revoke the privilege
CATALOG_MANAGE_CONTENTfrom the catalog role${OC_CATALOG_NAME}_catalog_admins:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
privileges catalog revoke \
--catalog "${OC_CATALOG_NAME:-polardb}" \
--catalog-role "${OC_CATALOG_NAME}_catalog_admin" \
CATALOG_MANAGE_CONTENT- Remove Principal Role
${OC_CATALOG_NAME}_adminfrom the catalog role${OC_CATALOG_NAME}_catalog_admin:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalog-roles grant \
--catalog "${OC_CATALOG_NAME:-polardb}" \
--principal-role "${OC_CATALOG_NAME}_admin" \
"${OC_CATALOG_NAME}_catalog_admin"- Delete the catalog role
${OC_CATALOG_NAME}_catalog_admin:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
catalog-roles delete \
--catalog "${OC_CATALOG_NAME:-polardb}" \
"${OC_CATALOG_NAME}_catalog_admin"- Revoke the Principal Role
${OC_CATALOG_NAME}_adminfrom the Principal${OC_ADMIN_USER_NAME}:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
principal-roles revoke \
--principal "${OC_ADMIN_USER_NAME}" \
"${OC_CATALOG_NAME}_admin"- Delete the Principal Role
${OC_CATALOG_NAME}_admin:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
principal-roles delete "${OC_CATALOG_NAME}_admin"- Delete the Principal
${OC_ADMIN_USER_NAME}:
polaris \
--base-url="${OC_API_URL}/polaris" \
--access-token="${ACCESS_TOKEN}" \
principals delete "${OC_ADMIN_USER_NAME}"Note
The namespaces, tables and the holding catalog is not deleted. Clean them up if needed via the OpenCatalog UI.
Clean up all AWS resources created for the Open Catalog integration:
Ensure you have the $AWS_ACCOUNT_ID set,
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query 'Account' --output text)"- Delete the S3 bucket and its contents:
aws s3 rb "s3://${OC_STORAGE_BUCKET_NAME}" --force- Detach the IAM role policy from the role:
aws iam detach-role-policy \
--role-name "${OC_STORAGE_AWS_ROLE_NAME}" \
--policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${OC_STORAGE_AWS_ROLE_POLICY_NAME}"- Delete the IAM role:
aws iam delete-role \
--role-name "${OC_STORAGE_AWS_ROLE_NAME}"- Delete the IAM policy:
aws iam delete-policy \
--policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${OC_STORAGE_AWS_ROLE_POLICY_NAME}"Lastly empty the resources created in the ${WORK_DIR} directory:
find "${WORK_DIR:?}" -name "*.json" -type f -delete