Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Update catalog docs to show automatic catalog syncs to Snowflake and Glue #549

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sagarlakshmipathy
Copy link
Contributor

Important Read

#548

What is the purpose of the pull request

  • Update Glue and Snowflake docs to show better catalog sync methods for iceberg tables

Brief change log

  • Updated Glue catalog doc
  • Updated Snowflake integration doc

Verify this pull request

  • Trivial docs work
  • Checked with npm start locally

@sagarlakshmipathy
Copy link
Contributor Author

@vinishjail97 can you review?

Copy link
Contributor

@ashvina ashvina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

website/docs/glue-catalog.md Outdated Show resolved Hide resolved
website/docs/glue-catalog.md Outdated Show resolved Hide resolved
website/docs/glue-catalog.md Outdated Show resolved Hide resolved
website/docs/glue-catalog.md Outdated Show resolved Hide resolved

* Build Apache XTable™ (Incubating) from [source](https://github.com/apache/incubator-xtable)
* Download `iceberg-aws-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Clarification] Are AWS libraries required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you suggest keeping it cloud agnostic? I have only tried with AWS S3 for Snowflake. I'm not even sure what libraries would be needed for GCP and Azure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Snowflake, we don't need iceberg-aws, it contains integrations with glue, dynamodb etc.
https://github.com/apache/iceberg/tree/main/aws/src/integration/java/org/apache/iceberg/aws

I'm not even sure what libraries would be needed for GCP and Azure

For snowflake we need permissions (IAM for AWS, service account for GCP etc.) and external volume setup.
https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume#create-an-external-volume

XTable can already read from S3/GCS/Azure Blob/HDFS using the hadoop library dependencies.
https://github.com/apache/incubator-xtable/blob/main/pom.xml#L360

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For snowflake we need permissions (IAM for AWS, service account for GCP etc.) and external volume setup.

Please confirm if my understanding below is correct.
Iceberg supports various catalogs, including JDBC and REST. The Snowflake catalog appears to be JDBC-based [1]. Therefore, when connecting XTable to the Snowflake catalog and updating Iceberg tables, a Snowflake JDBC driver should be a dependency [2]. Iceberg’s JDBC catalog clients should not need Spark or AWS dependencies. However, if someone wants to follow this tutorial end-to-end, they may need Spark runtime and AWS libraries.

If this is correct, it would be helpful to separate the prereqs into two sections: one for what XTable needs and another for the tutorial prerequisites.

[1] https://www.snowflake.com/en/blog/iceberg-tables-catalog-support-available-now/
[2] https://iceberg.apache.org/docs/1.5.0/jdbc/

Copy link
Contributor

@vinishjail97 vinishjail97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sagarlakshmipathy Added comments.


**Pre-requisites:**
* Download iceberg-aws-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws)
* Download bundle-X.X.X.jar from the [Maven repository](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download AWS Java SDK bundle-X.X.X.jar ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

This is unclear from docs.

Comment on lines +64 to +66
* Download `iceberg-spark-runtime-3.X_2.12/X.X.X.jar` from [here](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/1.4.2/)
* Download `snowflake-jdbc-X.X.X.jar` from the [Maven repository](https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include AWS Java SDK for aws bundle download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants