Open
Description
Question
Hello PyIceberg dev
I successfully set up lakekeeper as catalog and connect to ADLS gen 2 stoarge
I found out table.append takes a long time to finish (2+s)
Here is my question:
- Do we have other faster way to append table into Restcatalog?
- If not, I started logging the detailed time of append? Is there any way to speed up writing time?
(PS: I already dug into the source code and see there is append and fast_append)
Here is my code
from pyiceberg.catalog.rest import RestCatalog
catalog = RestCatalog(
name=CATALOG_NAME,
uri=CATALOG_URL,
warehouse=CATALOG_WAREHOUSE_PATH,
token=CATALOG_TOKEN,
properties={
"adlfs.account-name": AZURE_STORAGE_ACCOUNT_NAME,
"adlfs.container": CONTAINER_NAME,
"adlfs.client-id": AZURE_STORAGE_CLIENT_ID,
"adlfs.tenant-id": AZURE_STORAGE_TENANT_ID,
"adlfs.client-secret": AZURE_STORAGE_CLIENT_SECRET,
"client_secret": AZURE_STORAGE_CLIENT_SECRET,
"client_id": AZURE_STORAGE_CLIENT_ID,
"tenant_id": AZURE_STORAGE_TENANT_ID,
"io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
}
)
# My table is ready
load_start = time.time()
iceberg_table = catalog.load_table(iceberg_table_identifier)
load_end = time.time()
# Perform the append with optimized options
append_start = time.time()
# solution 1 (seems slow):
iceberg_table.append(table)
# # solution 2: Use a bulk transaction instead of a direct append
# # Fail with error...
# with iceberg_table.transaction() as txn:
# txn.append(table)
# txn.commit_transaction()
append_end = time.time()
Thanks for your help in advance :)
Metadata
Metadata
Assignees
Labels
No labels