You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tustvold
changed the title
LocalFileSystem::list returning different results from different OS
Return sorted results from ObjectStore::list
Mar 29, 2023
The returned order is not defined, as filesystems and object stores have different notions of what sorting means, lexicographic by full path or by path segment. Additionally many filesystems provide no guarantees on output ordering at all, in fact Windows is the only one that does IIRC.
If you require a consistent sort order I would recommend collecting the results, or using list_with_delimiter, and sorting the output
This definitely should be highlighted more clearly in the docs
tustvold
changed the title
Return sorted results from ObjectStore::list
Document ObjectStore::list Ordering
Mar 29, 2023
Describe the bug
In DataFusion, when listing all files (https://github.com/apache/arrow-datafusion/blob/c8a3d589889dd1e67047de89db8b4ff56f90f04c/datafusion/core/src/datasource/listing/url.rs#L151) using an LocalFileSystem object store the result is different depending the OS.
To Reproduce
data:image/s3,"s3://crabby-images/731e7/731e72c73c08c2bbacef3ca8abc1c4b96232f3d0" alt="image"
Having a folder:
and requesting to list the content of the folder using:
in windows/ubuntu the result is:
but in macOS Ventura:
Expected behavior
We expect that the result would be the same. This code is called when inferring the schema (https://github.com/apache/arrow-datafusion/blob/c8a3d589889dd1e67047de89db8b4ff56f90f04c/datafusion/core/src/datasource/listing/table.rs#L431) and the ordering for multiple files is important, as it does a merge of the schemas of all the files.
The text was updated successfully, but these errors were encountered: