-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incompatibility with hdfs.open #188
Comments
It seems arrow's |
Looks like we need to implement the second argument of file.seek in |
@martindurant hdfs3 worked and I'll keep it in mind as a fallback method. But I think I'm using pyarrow directly, because it was also 3 times faster on my parquet file. But thanks :). |
Would be interested to see your benchmark - the types of data, and profiling you might do (if you have the time and motivation). If you are happy as things are, please close this issue. |
I submitted a patch for the seek issue; I will test it out with fastparquet to make sure all is a-OK |
….seek I still need to validate this against the use case in dask/fastparquet#188 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#907 from wesm/ARROW-1287 and squashes the following commits: 933f3f6 [Wes McKinney] Add testing script for checking thirdparty library against pyarrow.HdfsClient 423ca87 [Wes McKinney] Implement whence argument for pyarrow.NativeFile.seek
Hi,
I tried to read a parquet file directly from hdfs using pyarrow, but if I set the open_with to hdfs.open, it doesn't seem to work:
The text was updated successfully, but these errors were encountered: