Skip to content

RestCatalog append table is slow (2+s) #1806

Open
@HungYangChang

Description

@HungYangChang

Question

Hello PyIceberg dev

I successfully set up lakekeeper as catalog and connect to ADLS gen 2 stoarge
I found out table.append takes a long time to finish (2+s)

Here is my question:

  1. Do we have other faster way to append table into Restcatalog?
  2. If not, I started logging the detailed time of append? Is there any way to speed up writing time?
    (PS: I already dug into the source code and see there is append and fast_append)

Image

Here is my code

from pyiceberg.catalog.rest import RestCatalog

catalog = RestCatalog(
            name=CATALOG_NAME,
            uri=CATALOG_URL,
            warehouse=CATALOG_WAREHOUSE_PATH,
            token=CATALOG_TOKEN,
            properties={
                "adlfs.account-name": AZURE_STORAGE_ACCOUNT_NAME,
                "adlfs.container": CONTAINER_NAME,
                "adlfs.client-id": AZURE_STORAGE_CLIENT_ID,
                "adlfs.tenant-id": AZURE_STORAGE_TENANT_ID,
                "adlfs.client-secret": AZURE_STORAGE_CLIENT_SECRET,
                "client_secret": AZURE_STORAGE_CLIENT_SECRET,
                "client_id": AZURE_STORAGE_CLIENT_ID,
                "tenant_id": AZURE_STORAGE_TENANT_ID,
                "io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
            }
        )

# My table is ready 

load_start = time.time()
iceberg_table = catalog.load_table(iceberg_table_identifier)
load_end = time.time()

# Perform the append with optimized options
append_start = time.time()
# solution 1 (seems slow):
iceberg_table.append(table)

# # solution 2: Use a bulk transaction instead of a direct append
# # Fail with error...
# with iceberg_table.transaction() as txn:
#     txn.append(table)
#     txn.commit_transaction()
append_end = time.time()

Thanks for your help in advance :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions