File paths get duplicated by list_files_by_fsspec
pipeline if folder path starts with az://
#840
Labels
good first issue
Good for newcomers
🐛 Describe the bug
Context
For reading files from Azure Blob storage Gen2,
fsspec
allows both path prefixesabfs://
oraz://
as synonyms. Paths startingabfs://
work fine for us, but paths starting withaz://
result in duplicated output when passed tolist_files_by_fsspec
pipeline.This first showed up in #836
Example
Output:
instead of the correct one
Possible reason
This probably has to do with how
FSSpecFileListerIterDataPipe.__iter__
decides if the path is local:az
, the variablefs.protocol
here is stillabfs
root.startswith(protocol)
is false, andis_local
is truePerhaps we'd need to find a different way of checking if the path is local, not relying on matching the beginning of the path with
fs.protocol
Versions
Collecting environment information...
PyTorch version: 1.14.0.dev20221018
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 12.6 (arm64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.102)
CMake version: version 3.22.1
Libc version: N/A
Python version: 3.9.13 (main, Oct 13 2022, 16:12:19) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-12.6-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy==0.982
[pip3] mypy-extensions==0.4.3
[pip3] torch==1.14.0.dev20221018
[conda] pytorch 1.14.0.dev20221018 py3.9_0 pytorch-nightly
The text was updated successfully, but these errors were encountered: