ValueError: Got more bytes so far (>2602512) than requested (2594828) when using `argo_loader` #67

ctroupin · 2020-10-19T19:32:00Z

I'm running a simple extraction of data over a rather small domain. The code works but for some combinations of parameters I get the ValueError... mentioned in the title.

MCVE Code Sample

import argopy
from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher()

domain_corse = (8.14516, 9.9408, 40.716401, 43.31488)
ds = argo_loader.region([domain_corse[0], domain_corse[1], domain_corse[2], domain_corse[3], 
                         0., 2000., '2019-01-01', '2019-11-30']).to_xarray()

ValueError Traceback (most recent call last)
in
----> 1 ds2c = argo_loader.region([domain_corse400[0], domain_corse400[1], domain_corse400[2], domain_corse400[3],
2 0., 2000., '2019-01-01','2019-11-30']).to_xarray()

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
220 if self._AccessPoint not in self.valid_access_points:
221 raise InvalidFetcherAccessPoint(" Initialize an access point (%s) first." % ",".join(self.Fetchers.keys()))
--> 222 xds = self.fetcher.to_xarray(**kwargs)
223 xds = self.postproccessor(xds)
224 return xds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self)
319
320 # Download data
--> 321 ds = self.fs.open_dataset(self.url)
322 ds = ds.rename({'row': 'N_POINTS'})
323

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/stores/fsspec_wrappers.py in open_dataset(self, url, **kwargs)
251 try:
252 with self.fs.open(url) as of:
--> 253 ds = xr.open_dataset(of, **kwargs)
254 self.register(url)
255 return ds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
536 engine = _get_engine_from_magic_number(filename_or_obj)
537 if engine == "scipy":
--> 538 store = backends.ScipyDataStore(filename_or_obj, **backend_kwargs)
539 elif engine == "h5netcdf":
540 store = backends.H5NetCDFStore.open(

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in init(self, filename_or_obj, mode, format, group, mmap, lock)
135 )
136 else:
--> 137 scipy_dataset = _open_scipy_netcdf(
138 filename_or_obj, mode=mode, mmap=mmap, version=version
139 )

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in _open_scipy_netcdf(filename, mode, mmap, version)
81
82 try:
---> 83 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version)
84 except TypeError as e: # netcdf3 message is obscure in this case
85 errmsg = e.args[0]

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in init(self, filename, mode, mmap, version, maskandscale)
279
280 if mode in 'ra':
--> 281 self._read()
282
283 def setattr(self, attr, value):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in _read(self)
608 self._read_dim_array()
609 self._read_gatt_array()
--> 610 self._read_var_array()
611
612 def _read_numrecs(self):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in read_var_array(self)
696 pos = self.fp.tell()
697 self.fp.seek(begin)
--> 698 data = frombuffer(self.fp.read(a_size), dtype=dtype_
699 ).copy()
700 data.shape = shape

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in read(self, length)
276 else:
277 length = min(self.size - self.loc, length)
--> 278 return super().read(length)
279
280 def _fetch_all(self):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/spec.py in read(self, length)
1237 # don't even bother calling fetch
1238 return b""
-> 1239 out = self.cache._fetch(self.loc, self.loc + length)
1240 self.loc += len(out)
1241 return out

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/caching.py in _fetch(self, start, end)
354 self.start = start
355 else:
--> 356 new = self.fetcher(self.end, bend)
357 self.cache = self.cache + new
358

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in _fetch_range(self, start, end)
328 cl += len(chunk)
329 if cl > end - start:
--> 330 raise ValueError(
331 "Got more bytes so far (>%i) than requested (%i)"
332 % (cl, end - start)

ValueError: Got more bytes so far (>2602512) than requested (2594828)

Expected Output

When I'm running the same code, except that I change the end date:

ds = argo_loader.region([domain_corse[0], domain_corse[1], domain_corse[2], domain_corse[3], 
                         0., 2000., '2019-01-01', '2019-11-29']).to_xarray()

then it works.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.0 (default, Dec 10 2019, 10:35:48)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.15.0-120-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3

argopy: 0.1.5
xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.1
scipy: 1.5.1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.27.0
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: None
setuptools: 49.2.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.16.1
sphinx: 3.1.2

The text was updated successfully, but these errors were encountered:

quai20 · 2020-10-20T08:36:01Z

Hi @ctroupin !

An issue about this error was raised some weeks ago. I think @gmaze implemented something about it in 28, this is merged in master.
I can't reproduce the error with my argopy (0.1.6 installed from master). Note that if some errors about the request size are raised, you have now a way to fetch larger datasets : argopy.readthedocs.io/en/latest/performances.html#Parallel-data-fetching

For the record, can you output your versions of erddapy and fsspec (in the last release, they're not in the show_versions()) ?

import erddapy
print("erddapy", erddapy.__version__)
import fsspec
print("fsspec", fsspec.__version__)

Thanks,

ctroupin · 2020-10-20T08:42:22Z

Hello Kevin,
yes I noticed I've not the latest version, I'm doing the upgrade now and keep your informed.
And I'll try the parallel data fetching.

Thanks

ctroupin · 2020-10-20T15:00:52Z

OK so I've upgraded to version 1.6 and the error still happens.

import erddapy
print("erddapy", erddapy.__version__)
import fsspec
print("fsspec", fsspec.__version__)

gives

erddapy 0.5.3
fsspec 0.7.4

so I will probably install directly from the master branch.

Edit: after installing the master branch, no changes.

Also testing the parallel option, for instance (copied from the doc):

box = [-60, 0, 0.0, 60.0, 0.0, 500.0, "2007", "2010"]

# Init a parallel fetcher:
loader_par = ArgoDataFetcher(src='erddap',
                             parallel=True,
                             ).region(box)

returns

TypeError: init() got an unexpected keyword argument 'parallel'

I guess I'm doing something wrong with the update.

ctroupin · 2020-10-22T09:24:15Z

Hello,
continuing the tests:
I've ensured I got the master version and run the application. For example (taken from the doc):

from argopy import DataFetcher as ArgoDataFetcher
ds = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012']).to_xarray()

returns:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-52b7d204dc29> in <module>
      1 from argopy import DataFetcher as ArgoDataFetcher
----> 2 ds = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012']).to_xarray()

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
    270             raise InvalidFetcher(" Initialize an access point (%s) first." %
    271                                  ",".join(self.Fetchers.keys()))
--> 272         xds = self.fetcher.to_xarray(**kwargs)
    273         xds = self.postproccessor(xds)
    274         return xds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self, errors)
    417         if not self.parallel:
    418             if len(self.uri) == 1:
--> 419                 ds = self.fs.open_dataset(self.uri[0])
    420             else:
    421                 ds = self.fs.open_mfdataset(

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/stores/filesystems.py in open_dataset(self, url, *args, **kwargs)
    464         # with self.fs.open(url) as of:
    465         #     ds = xr.open_dataset(of, *args, **kwargs)
--> 466         data = self.fs.cat_file(url)
    467         ds = xr.open_dataset(data, *args, **kwargs)
    468         if "source" not in ds.encoding:

AttributeError: 'HTTPFileSystem' object has no attribute 'cat_file'

I also tried a simpler request:

ds = ArgoDataFetcher().float(6902746).to_xarray()

which returns the same error. I guess it is better to come back to a released version (not the master).

ctroupin · 2020-10-22T16:27:14Z

I'm again with version 0.1.6.

ds = ArgoDataFetcher().float(6902746).to_xarray()

now works (as expected).

As you suggested I tried the parallel fetcher:

# Define a box to load (large enough to trigger chunking):
box = [-60, -30, 40.0, 60.0, 0.0, 100.0, "2007-01-01", "2007-04-01"]

# Instantiate a parallel fetcher:
loader_par = ArgoDataFetcher(src='erddap', parallel=True).region(box)

and got a TypeError:

-------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-8183107cd940> in <module>
      3 
      4 # Instantiate a parallel fetcher:
----> 5 loader_par = ArgoDataFetcher(src='erddap', parallel=True).region(box)

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in region(self, box)
    193         """
    194         if 'region' in self.Fetchers:
--> 195             self.fetcher = self.Fetchers['region'](box=box, **self.fetcher_options)
    196             self._AccessPoint = 'region'  # Register the requested access point
    197         else:

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in __init__(self, ds, cache, cachedir, **kwargs)
     87         self.dataset_id = OPTIONS['dataset'] if ds == '' else ds
     88         self.server = api_server
---> 89         self.init(**kwargs)
     90         self._init_erddapy()
     91 

TypeError: init() got an unexpected keyword argument 'parallel'

Last thing I tried: to split my request on shorter periods of time, to avoid the Got more bytes so far... error, and run a loop over all the periods (month for example). That worked for a few months, till I get:

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Any hint is obviously welcome.

quai20 · 2020-10-29T08:21:35Z

Hey @ctroupin
I feel like a clean reinstall may be a good idea.

pip uninstall argopy
pip install git+http://github.com/euroargodev/argopy.git@master

There are some versionning weirdness with xarray and fsspec, so I would also advise not to go higher than fsspec 0.8.0 and xarray 0.16.0

Also :

Last thing I tried: to split my request on shorter periods of time, to avoid the Got more bytes so far... error, and run a loop over all the periods (month for example).

That's exactly what's the parallel option is doing for you.

ctroupin · 2020-11-16T13:50:54Z

Well, I started from scratch, unfortunately bumped into issue 70 related to erddapy.

Do you recommend working with an older version?

Thanks

gmaze · 2020-11-16T14:02:19Z

Hi @ctroupin
We're reviewing pr #65 which should fix all of these
Once merged, we will issue a new release that should be more stable and allows to work with all recent versions of xarray, erddapy and fsspec !
In the mean, you can checkout here:
https://github.com/euroargodev/argopy/tree/xarray-016/ci/requirements
the version of dependencies with which the new release will work, see files py3.*-dev.yml

ctroupin · 2020-11-16T15:26:42Z

Hello @gmaze, I'll try what suggest, or maybe I'll wait for the new release (quite busy time right now) ;)

Thanks!

gmaze · 2021-09-01T08:32:49Z

Closing because seems like the problem is solved with last release or master version

ctroupin · 2021-09-01T14:38:44Z

Thanks @gmaze, looking forward to using the latest release, I really love the tool!

gmaze added the bug Something isn't working label Nov 10, 2020

gmaze closed this as completed Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using `argo_loader` #67

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using `argo_loader` #67

ctroupin commented Oct 19, 2020

INSTALLED VERSIONS

quai20 commented Oct 20, 2020

ctroupin commented Oct 20, 2020

ctroupin commented Oct 20, 2020 •

edited

Loading

ctroupin commented Oct 22, 2020

ctroupin commented Oct 22, 2020

quai20 commented Oct 29, 2020 •

edited

Loading

ctroupin commented Nov 16, 2020

gmaze commented Nov 16, 2020

ctroupin commented Nov 16, 2020

gmaze commented Sep 1, 2021

ctroupin commented Sep 1, 2021

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using argo_loader #67

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using argo_loader #67

Comments

ctroupin commented Oct 19, 2020

MCVE Code Sample

Expected Output

Versions

INSTALLED VERSIONS

quai20 commented Oct 20, 2020

ctroupin commented Oct 20, 2020

ctroupin commented Oct 20, 2020 • edited Loading

ctroupin commented Oct 22, 2020

ctroupin commented Oct 22, 2020

quai20 commented Oct 29, 2020 • edited Loading

ctroupin commented Nov 16, 2020

gmaze commented Nov 16, 2020

ctroupin commented Nov 16, 2020

gmaze commented Sep 1, 2021

ctroupin commented Sep 1, 2021

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using `argo_loader` #67

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using `argo_loader` #67

ctroupin commented Oct 20, 2020 •

edited

Loading

quai20 commented Oct 29, 2020 •

edited

Loading