Different schemas when inferring from local system in different OS #5779
Labels
bug
Something isn't working
good first issue
Good for newcomers
help wanted
Extra attention is needed
Describe the bug
When inferring a schema, the list_all_files uses an object store to list the files. No sorting is passed.
When the object store is a LocalFileSystem, there isn't an insurance of any file sorting (the return list of a macOs has a different sort of windows). This means that the inferred schema can be different for the same set of files.
We contact the object store (apache/arrow-rs#3975) that point it out that the solution should be implemented in the caller of the method, applying a sort of any type, to maintain consistency between file systems.
To Reproduce
Having two parquet files in the filesystem with the schema:
and executing:
the result in macOs Ventura:
the first file pickup was the file3.parquet
and using windows
the first file pickup was the file1.parquet
Expected behavior
The same schema independently the OS where the code is run. A sort should be forced or at least given the possibility of passing a sort function
Additional context
No response
The text was updated successfully, but these errors were encountered: