-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
When I open an fsspec s3 file twice, it results in an error, "file-like object read/write pointer not at the start of the file".
Here's a Dockerfile I used for the environment:
FROM condaforge/mambaforge:4.12.0-0
RUN mamba install -y --strict-channel-priority -c conda-forge python=3.10 dask h5netcdf xarray fsspec s3fs
Input1:
import fsspec
import xarray as xr
fs = fsspec.filesystem('s3', anon=True)
fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc'
data = fs.open(fp)
xr.open_dataset(data, engine='h5netcdf', chunks={})
xr.open_dataset(data, engine='h5netcdf', chunks={})
Output1:
Traceback (most recent call last):
File "//example.py", line 26, in <module>
xr.open_dataset(data, engine='h5netcdf', chunks={})
File "/opt/conda/lib/python3.10/site-packages/xarray/backends/api.py", line 531, in open_dataset
backend_ds = backend.open_dataset(
File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 389, in open_dataset
store = H5NetCDFStore.open(
File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 157, in open
magic_number = read_magic_number_from_file(filename)
File "/opt/conda/lib/python3.10/site-packages/xarray/core/utils.py", line 645, in read_magic_number_from_file
raise ValueError(
ValueError: cannot guess the engine, file-like object read/write pointer not at the start of the file, please close and reopen, or use a context manager
----- INVALID EXAMPLE 2 -----
Input2:
import fsspec
import xarray as xr
fs = fsspec.filesystem('s3', anon=True)
fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc'
data = fs.open(fp, mode='r')
xr.open_dataset(data, engine='h5netcdf', chunks={})
xr.open_dataset(data, engine='h5netcdf', chunks={})
Output2:
Traceback (most recent call last):
File "//example.py", line 25, in <module>
xr.open_dataset(data, engine='h5netcdf', chunks={})
File "/opt/conda/lib/python3.10/site-packages/xarray/backends/api.py", line 531, in open_dataset
backend_ds = backend.open_dataset(
File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 389, in open_dataset
store = H5NetCDFStore.open(
File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 157, in open
magic_number = read_magic_number_from_file(filename)
File "/opt/conda/lib/python3.10/site-packages/xarray/core/utils.py", line 650, in read_magic_number_from_file
magic_number = filename_or_obj.read(count)
File "/opt/conda/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
----- INVALID EXAMPLE 2 -----
What did you expect to happen?
I expect both calls to open_dataset to yield the same result and not error. The following runs without errors:
import fsspec
import xarray as xr
fs = fsspec.filesystem('s3', anon=True)
fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc'
data = fs.open(fp)
xr.open_dataset(data, engine='h5netcdf', chunks={})
data = fs.open(fp)
xr.open_dataset(data, engine='h5netcdf', chunks={})
Minimal Complete Verifiable Example
No response
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
No response
Anything else we need to know?
I see the same error mentioned in other issues like #3991, but it was determined to be a problem with the input data.
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-348.20.1.el8_5.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: None
xarray: 2022.6.0rc0
pandas: 1.4.3
numpy: 1.23.1
scipy: None
netCDF4: None
pydap: None
h5netcdf: 1.0.1
h5py: 3.7.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.7.0
distributed: 2022.7.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.2.0
pip: 22.0.4
conda: 4.13.0
pytest: None
IPython: None
sphinx: None