-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Got more bytes so far (>2602512) than requested (2594828) when using argo_loader
#67
Comments
Hi @ctroupin ! An issue about this error was raised some weeks ago. I think @gmaze implemented something about it in 28, this is merged in master. For the record, can you output your versions of erddapy and fsspec (in the last release, they're not in the import erddapy
print("erddapy", erddapy.__version__)
import fsspec
print("fsspec", fsspec.__version__) Thanks, |
Hello Kevin, Thanks |
OK so I've upgraded to version 1.6 and the error still happens. import erddapy
print("erddapy", erddapy.__version__)
import fsspec
print("fsspec", fsspec.__version__) gives erddapy 0.5.3
fsspec 0.7.4 so I will probably install directly from the Edit: after installing the Also testing the parallel option, for instance (copied from the doc): box = [-60, 0, 0.0, 60.0, 0.0, 500.0, "2007", "2010"]
# Init a parallel fetcher:
loader_par = ArgoDataFetcher(src='erddap',
parallel=True,
).region(box) returns TypeError: init() got an unexpected keyword argument 'parallel' I guess I'm doing something wrong with the update. |
Hello, from argopy import DataFetcher as ArgoDataFetcher
ds = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012']).to_xarray() returns: AttributeError Traceback (most recent call last)
<ipython-input-2-52b7d204dc29> in <module>
1 from argopy import DataFetcher as ArgoDataFetcher
----> 2 ds = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012']).to_xarray()
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
270 raise InvalidFetcher(" Initialize an access point (%s) first." %
271 ",".join(self.Fetchers.keys()))
--> 272 xds = self.fetcher.to_xarray(**kwargs)
273 xds = self.postproccessor(xds)
274 return xds
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self, errors)
417 if not self.parallel:
418 if len(self.uri) == 1:
--> 419 ds = self.fs.open_dataset(self.uri[0])
420 else:
421 ds = self.fs.open_mfdataset(
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/stores/filesystems.py in open_dataset(self, url, *args, **kwargs)
464 # with self.fs.open(url) as of:
465 # ds = xr.open_dataset(of, *args, **kwargs)
--> 466 data = self.fs.cat_file(url)
467 ds = xr.open_dataset(data, *args, **kwargs)
468 if "source" not in ds.encoding:
AttributeError: 'HTTPFileSystem' object has no attribute 'cat_file' I also tried a simpler request: ds = ArgoDataFetcher().float(6902746).to_xarray() which returns the same error. I guess it is better to come back to a released version (not the master). |
I'm again with version ds = ArgoDataFetcher().float(6902746).to_xarray() now works (as expected). As you suggested I tried the parallel fetcher: # Define a box to load (large enough to trigger chunking):
box = [-60, -30, 40.0, 60.0, 0.0, 100.0, "2007-01-01", "2007-04-01"]
# Instantiate a parallel fetcher:
loader_par = ArgoDataFetcher(src='erddap', parallel=True).region(box) and got a -------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-8183107cd940> in <module>
3
4 # Instantiate a parallel fetcher:
----> 5 loader_par = ArgoDataFetcher(src='erddap', parallel=True).region(box)
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in region(self, box)
193 """
194 if 'region' in self.Fetchers:
--> 195 self.fetcher = self.Fetchers['region'](box=box, **self.fetcher_options)
196 self._AccessPoint = 'region' # Register the requested access point
197 else:
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in __init__(self, ds, cache, cachedir, **kwargs)
87 self.dataset_id = OPTIONS['dataset'] if ds == '' else ds
88 self.server = api_server
---> 89 self.init(**kwargs)
90 self._init_erddapy()
91
TypeError: init() got an unexpected keyword argument 'parallel' Last thing I tried: to split my request on shorter periods of time, to avoid the ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) Any hint is obviously welcome. |
Hey @ctroupin
There are some versionning weirdness with xarray and fsspec, so I would also advise not to go higher than Also :
That's exactly what's the parallel option is doing for you. |
Well, I started from scratch, unfortunately bumped into issue 70 related to Do you recommend working with an older version? Thanks |
Hi @ctroupin |
Hello @gmaze, I'll try what suggest, or maybe I'll wait for the new release (quite busy time right now) ;) Thanks! |
Closing because seems like the problem is solved with last release or master version |
Thanks @gmaze, looking forward to using the latest release, I really love the tool! |
I'm running a simple extraction of data over a rather small domain. The code works but for some combinations of parameters I get the
ValueError...
mentioned in the title.MCVE Code Sample
ValueError Traceback (most recent call last)
in
----> 1 ds2c = argo_loader.region([domain_corse400[0], domain_corse400[1], domain_corse400[2], domain_corse400[3],
2 0., 2000., '2019-01-01','2019-11-30']).to_xarray()
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
220 if self._AccessPoint not in self.valid_access_points:
221 raise InvalidFetcherAccessPoint(" Initialize an access point (%s) first." % ",".join(self.Fetchers.keys()))
--> 222 xds = self.fetcher.to_xarray(**kwargs)
223 xds = self.postproccessor(xds)
224 return xds
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self)
319
320 # Download data
--> 321 ds = self.fs.open_dataset(self.url)
322 ds = ds.rename({'row': 'N_POINTS'})
323
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/stores/fsspec_wrappers.py in open_dataset(self, url, **kwargs)
251 try:
252 with self.fs.open(url) as of:
--> 253 ds = xr.open_dataset(of, **kwargs)
254 self.register(url)
255 return ds
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
536 engine = _get_engine_from_magic_number(filename_or_obj)
537 if engine == "scipy":
--> 538 store = backends.ScipyDataStore(filename_or_obj, **backend_kwargs)
539 elif engine == "h5netcdf":
540 store = backends.H5NetCDFStore.open(
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in init(self, filename_or_obj, mode, format, group, mmap, lock)
135 )
136 else:
--> 137 scipy_dataset = _open_scipy_netcdf(
138 filename_or_obj, mode=mode, mmap=mmap, version=version
139 )
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in _open_scipy_netcdf(filename, mode, mmap, version)
81
82 try:
---> 83 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version)
84 except TypeError as e: # netcdf3 message is obscure in this case
85 errmsg = e.args[0]
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in init(self, filename, mode, mmap, version, maskandscale)
279
280 if mode in 'ra':
--> 281 self._read()
282
283 def setattr(self, attr, value):
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in _read(self)
608 self._read_dim_array()
609 self._read_gatt_array()
--> 610 self._read_var_array()
611
612 def _read_numrecs(self):
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in read_var_array(self)
696 pos = self.fp.tell()
697 self.fp.seek(begin)
--> 698 data = frombuffer(self.fp.read(a_size), dtype=dtype_
699 ).copy()
700 data.shape = shape
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in read(self, length)
276 else:
277 length = min(self.size - self.loc, length)
--> 278 return super().read(length)
279
280 def _fetch_all(self):
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/spec.py in read(self, length)
1237 # don't even bother calling fetch
1238 return b""
-> 1239 out = self.cache._fetch(self.loc, self.loc + length)
1240 self.loc += len(out)
1241 return out
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/caching.py in _fetch(self, start, end)
354 self.start = start
355 else:
--> 356 new = self.fetcher(self.end, bend)
357 self.cache = self.cache + new
358
~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in _fetch_range(self, start, end)
328 cl += len(chunk)
329 if cl > end - start:
--> 330 raise ValueError(
331 "Got more bytes so far (>%i) than requested (%i)"
332 % (cl, end - start)
ValueError: Got more bytes so far (>2602512) than requested (2594828)
Expected Output
When I'm running the same code, except that I change the end date:
then it works.
Versions
Output of `argopy.show_versions()`
INSTALLED VERSIONS
commit: None
python: 3.8.0 (default, Dec 10 2019, 10:35:48)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.15.0-120-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3
argopy: 0.1.5
xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.1
scipy: 1.5.1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.27.0
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: None
setuptools: 49.2.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.16.1
sphinx: 3.1.2
The text was updated successfully, but these errors were encountered: