-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to read delta table on mounted disk #1189
Comments
Apparently, the current version (0.7.0) is unable to read a table, locally or over a network, if the path contains characters like space, which will be encoded with percent. This issue is related to this issue. @wjones127, I attempted some hotfixes for my use case, which can be found in that PR. Please let me know if you would like me to open this PR in the main repo, as I would be eager to help. The reason it is not working currently is that the My use case is a little more complex, as the path to the mounted folder does not contain any "bad" characters. However, when this path is expanded to some network server address, it contains URL-encoded characters. Therefore, I had to do something with the |
That sounds like it needs to be cleaned up. Possibly related to #1079 too. I'll take a look at this soon. |
So I think the problem is when we load a delta table we take the path (as a string) and convert it into a Url (which percent-encodes values). This makes sense for objects stores, but not local file systems. I think we need to think about a better way to handle this. cc @roeap |
I thinking we should instead use a type like: enum TableUri {
LocalPath(Path),
RemoteUrl(Url)
} What do you think of that @roeap? |
Huh, actually it might be that https://docs.rs/object_store/0.5.5/object_store/path/struct.Path.html#path-safety But it seems like we should be able to support arbitrary characters above the root, as long as we are passing them along correctly? |
@wjones127 - sorry for taking so long, this somehow slipped my attention. I think if just want to support spaces in the table root, we should get it to work, somehow like you described. As you mentioned object store is quite strict around how paths should look like, but I have right now no feeling on how limiting that is. I.e. if other writers can create somemting in an object store, that we would not be able to read. To me all of this eventually points to handling absolute paths in the log as well since this will require different path handling quite deep into the log handling. While I have not yet really thought about this problem, I also haven't had an idea yet on that area that resonated.. |
Environment
Delta-rs version:
0.7.0 (latest main)
Binding:
Python
Environment:
Bug
What happened:
I'm trying to read a delta-table, that located on a local server.
If we run the following code:
We will get the following error:
Traceback
The
*.parquet
file is present.What you expected to happen:
Succeful read
How to reproduce it:
More details:
Worth noting:
file://
) in url-path, as suggested in this issue, the code will do some more progress (it will find and read all*.parquet
files), but will fail with a SIGABRT.From (1) I'm taking a guess, that there's a broken logic either somewhere in deltalake python wrapper, because (looking at the traceback):
File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file
is a call from PyArrow to aSystemHandler
, which isDeltaFileSystemHandler
The text was updated successfully, but these errors were encountered: