You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Users/peter.ke/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/Users/peter.ke/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/Users/peter.ke/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/Users/peter.ke/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/peter.ke/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "<stdin>", line 2, in <lambda>
File "/Users/peter.ke/.pyenv/versions/3.10.13/lib/python3.10/site-packages/deltalake/writer.py", line 302, in write_deltalake
table.update_incremental()
File "/Users/peter.ke/.pyenv/versions/3.10.13/lib/python3.10/site-packages/deltalake/table.py", line 1258, in update_incremental
self._table.update_incremental()
RuntimeError: Already borrowed
More details:
Cause of error:
write_deltalake calls write_to_deltalake from rust to write, which borrows the RawDeltaTable immutably while releasing the GIL. The immutable borrow means theoretically writes in multiple threads should be fine
However, write_deltalake also calls table.update_incremental(), which borrows the RawDeltaTable mutably
Since GIL was released in write_to_deltalake, another thread running write_deltalake will try to mutably borrow the previously immutably borrowed RawDeltaTable and fails
From my brief investigation, I see two potential solutions:
The Rust API exposed to python could benefit from being refactored to be more immutable so it fits better in a multi-threaded environment. e.g. on the rust side, update_incremental can return a new DeltaTable rather than mutating the existing one- similarly for other internals like DeltaTableState or Snapshot.
This should be similar performance/memory wise, since cloning a Snapshot seems relatively cheap given that the bulk files uses arrow. However, this looks like a large-ish refactor because a lot of the internals expose mutable APIs.
Remove update_incremental from write_deltalake (or add a flag to disable it). We'd just need to ensure write_to_deltalake returns an updated RawDeltaTable so table._table can kept updated.
The text was updated successfully, but these errors were encountered:
I believe you could get around it with multiprocessing (using spawn).
True, that does work. DeltaTable is not picklable though so it's a bit annoying. Since write_deltalake releases the GIL, I believe like the intention is to support multithreaded writes.
I think it's best to make the update_incremental call optional, seems unnecessary to always incur this overhead in the circumstances when the caller already know it has an up-to-date table instance.
Environment
Delta-rs version: 0.20.1
Binding: python
Environment:
Bug
What happened:
Using the same
DeltaTable
across multiple threads inwrite_deltalake
causes the errorThis is not an issue if we use the table URI directly.
What you expected to happen:
Writes should work in a multithreaded environment.
How to reproduce it:
produces
More details:
Cause of error:
write_deltalake
callswrite_to_deltalake
from rust to write, which borrows theRawDeltaTable
immutably while releasing the GIL. The immutable borrow means theoretically writes in multiple threads should be finewrite_deltalake
also callstable.update_incremental()
, which borrows theRawDeltaTable
mutablywrite_to_deltalake
, another thread runningwrite_deltalake
will try to mutably borrow the previously immutably borrowedRawDeltaTable
and failsFrom my brief investigation, I see two potential solutions:
The Rust API exposed to python could benefit from being refactored to be more immutable so it fits better in a multi-threaded environment. e.g. on the rust side,
update_incremental
can return a newDeltaTable
rather than mutating the existing one- similarly for other internals likeDeltaTableState
orSnapshot
.This should be similar performance/memory wise, since cloning a
Snapshot
seems relatively cheap given that the bulkfiles
uses arrow. However, this looks like a large-ish refactor because a lot of the internals expose mutable APIs.Remove
update_incremental
fromwrite_deltalake
(or add a flag to disable it). We'd just need to ensurewrite_to_deltalake
returns an updatedRawDeltaTable
sotable._table
can kept updated.The text was updated successfully, but these errors were encountered: