-
Notifications
You must be signed in to change notification settings - Fork 268
RestCatalog append table is slow (2+s) #1806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I did some dirty logging in pyiceberg.table.append
Here is the result I got: [2025-03-18T18:35:19.587Z] set up 0.018 seconds |
Hi @HungYangChang - thanks for posting the logs! A couple of things to unpack here: iceberg-python/pyiceberg/io/pyarrow.py Lines 2563 to 2569 in a294257
Instead, it writes the parquet files when the iterator's elements are appended to
From my observation of the logs, your commit does seem to be taking:
Do you have access to the Lakekeeper logs that gives your information on how long it take for the Rest Catalog to process the commit request? Once it accepts the commit request, the Rest Catalog must write the metadata on its end and then return an HTTP response back to PyIceberg. It would be good to compare this number against the request->response wall time Lakekeeper is reporting for your specific commit request |
I wonder if @c-thiel has any thoughts about the best way to profile this from the Lakekeeper side? My guess is enable tracing logs? |
Question
Hello PyIceberg dev
I successfully set up lakekeeper as catalog and connect to ADLS gen 2 stoarge
I found out table.append takes a long time to finish (2+s)
Here is my question:
(PS: I already dug into the source code and see there is append and fast_append)
Here is my code
Thanks for your help in advance :)
The text was updated successfully, but these errors were encountered: