Skip to content

RestCatalog append table is slow (2+s) #1806

@HungYangChang

Description

@HungYangChang

Question

Hello PyIceberg dev

I successfully set up lakekeeper as catalog and connect to ADLS gen 2 stoarge
I found out table.append takes a long time to finish (2+s)

Here is my question:

  1. Do we have other faster way to append table into Restcatalog?
  2. If not, I started logging the detailed time of append? Is there any way to speed up writing time?
    (PS: I already dug into the source code and see there is append and fast_append)

Image

Here is my code

from pyiceberg.catalog.rest import RestCatalog

catalog = RestCatalog(
            name=CATALOG_NAME,
            uri=CATALOG_URL,
            warehouse=CATALOG_WAREHOUSE_PATH,
            token=CATALOG_TOKEN,
            properties={
                "adlfs.account-name": AZURE_STORAGE_ACCOUNT_NAME,
                "adlfs.container": CONTAINER_NAME,
                "adlfs.client-id": AZURE_STORAGE_CLIENT_ID,
                "adlfs.tenant-id": AZURE_STORAGE_TENANT_ID,
                "adlfs.client-secret": AZURE_STORAGE_CLIENT_SECRET,
                "client_secret": AZURE_STORAGE_CLIENT_SECRET,
                "client_id": AZURE_STORAGE_CLIENT_ID,
                "tenant_id": AZURE_STORAGE_TENANT_ID,
                "io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
            }
        )

# My table is ready 

load_start = time.time()
iceberg_table = catalog.load_table(iceberg_table_identifier)
load_end = time.time()

# Perform the append with optimized options
append_start = time.time()
# solution 1 (seems slow):
iceberg_table.append(table)

# # solution 2: Use a bulk transaction instead of a direct append
# # Fail with error...
# with iceberg_table.transaction() as txn:
#     txn.append(table)
#     txn.commit_transaction()
append_end = time.time()

Thanks for your help in advance :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions