Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using argo_loader #67

Closed
ctroupin opened this issue Oct 19, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@ctroupin
Copy link

I'm running a simple extraction of data over a rather small domain. The code works but for some combinations of parameters I get the ValueError... mentioned in the title.

MCVE Code Sample

import argopy
from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher()

domain_corse = (8.14516, 9.9408, 40.716401, 43.31488)
ds = argo_loader.region([domain_corse[0], domain_corse[1], domain_corse[2], domain_corse[3], 
                         0., 2000., '2019-01-01', '2019-11-30']).to_xarray()

ValueError Traceback (most recent call last)
in
----> 1 ds2c = argo_loader.region([domain_corse400[0], domain_corse400[1], domain_corse400[2], domain_corse400[3],
2 0., 2000., '2019-01-01','2019-11-30']).to_xarray()

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
220 if self._AccessPoint not in self.valid_access_points:
221 raise InvalidFetcherAccessPoint(" Initialize an access point (%s) first." % ",".join(self.Fetchers.keys()))
--> 222 xds = self.fetcher.to_xarray(**kwargs)
223 xds = self.postproccessor(xds)
224 return xds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self)
319
320 # Download data
--> 321 ds = self.fs.open_dataset(self.url)
322 ds = ds.rename({'row': 'N_POINTS'})
323

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/stores/fsspec_wrappers.py in open_dataset(self, url, **kwargs)
251 try:
252 with self.fs.open(url) as of:
--> 253 ds = xr.open_dataset(of, **kwargs)
254 self.register(url)
255 return ds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
536 engine = _get_engine_from_magic_number(filename_or_obj)
537 if engine == "scipy":
--> 538 store = backends.ScipyDataStore(filename_or_obj, **backend_kwargs)
539 elif engine == "h5netcdf":
540 store = backends.H5NetCDFStore.open(

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in init(self, filename_or_obj, mode, format, group, mmap, lock)
135 )
136 else:
--> 137 scipy_dataset = _open_scipy_netcdf(
138 filename_or_obj, mode=mode, mmap=mmap, version=version
139 )

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in _open_scipy_netcdf(filename, mode, mmap, version)
81
82 try:
---> 83 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version)
84 except TypeError as e: # netcdf3 message is obscure in this case
85 errmsg = e.args[0]

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in init(self, filename, mode, mmap, version, maskandscale)
279
280 if mode in 'ra':
--> 281 self._read()
282
283 def setattr(self, attr, value):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in _read(self)
608 self._read_dim_array()
609 self._read_gatt_array()
--> 610 self._read_var_array()
611
612 def _read_numrecs(self):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in read_var_array(self)
696 pos = self.fp.tell()
697 self.fp.seek(begin
)
--> 698 data = frombuffer(self.fp.read(a_size), dtype=dtype_
699 ).copy()
700 data.shape = shape

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in read(self, length)
276 else:
277 length = min(self.size - self.loc, length)
--> 278 return super().read(length)
279
280 def _fetch_all(self):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/spec.py in read(self, length)
1237 # don't even bother calling fetch
1238 return b""
-> 1239 out = self.cache._fetch(self.loc, self.loc + length)
1240 self.loc += len(out)
1241 return out

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/caching.py in _fetch(self, start, end)
354 self.start = start
355 else:
--> 356 new = self.fetcher(self.end, bend)
357 self.cache = self.cache + new
358

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in _fetch_range(self, start, end)
328 cl += len(chunk)
329 if cl > end - start:
--> 330 raise ValueError(
331 "Got more bytes so far (>%i) than requested (%i)"
332 % (cl, end - start)

ValueError: Got more bytes so far (>2602512) than requested (2594828)

Expected Output

When I'm running the same code, except that I change the end date:

ds = argo_loader.region([domain_corse[0], domain_corse[1], domain_corse[2], domain_corse[3], 
                         0., 2000., '2019-01-01', '2019-11-29']).to_xarray()

then it works.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.0 (default, Dec 10 2019, 10:35:48)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.15.0-120-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3

argopy: 0.1.5
xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.1
scipy: 1.5.1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.27.0
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: None
setuptools: 49.2.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.16.1
sphinx: 3.1.2

@quai20
Copy link
Member

quai20 commented Oct 20, 2020

Hi @ctroupin !

An issue about this error was raised some weeks ago. I think @gmaze implemented something about it in 28, this is merged in master.
I can't reproduce the error with my argopy (0.1.6 installed from master). Note that if some errors about the request size are raised, you have now a way to fetch larger datasets : argopy.readthedocs.io/en/latest/performances.html#Parallel-data-fetching

For the record, can you output your versions of erddapy and fsspec (in the last release, they're not in the show_versions()) ?

import erddapy
print("erddapy", erddapy.__version__)
import fsspec
print("fsspec", fsspec.__version__)

Thanks,

@ctroupin
Copy link
Author

Hello Kevin,
yes I noticed I've not the latest version, I'm doing the upgrade now and keep your informed.
And I'll try the parallel data fetching.

Thanks

@ctroupin
Copy link
Author

ctroupin commented Oct 20, 2020

OK so I've upgraded to version 1.6 and the error still happens.

import erddapy
print("erddapy", erddapy.__version__)
import fsspec
print("fsspec", fsspec.__version__)

gives

erddapy 0.5.3
fsspec 0.7.4

so I will probably install directly from the master branch.

Edit: after installing the master branch, no changes.

Also testing the parallel option, for instance (copied from the doc):

box = [-60, 0, 0.0, 60.0, 0.0, 500.0, "2007", "2010"]

# Init a parallel fetcher:
loader_par = ArgoDataFetcher(src='erddap',
                             parallel=True,
                             ).region(box)

returns

TypeError: init() got an unexpected keyword argument 'parallel'

I guess I'm doing something wrong with the update.

@ctroupin
Copy link
Author

Hello,
continuing the tests:
I've ensured I got the master version and run the application. For example (taken from the doc):

from argopy import DataFetcher as ArgoDataFetcher
ds = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012']).to_xarray()

returns:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-52b7d204dc29> in <module>
      1 from argopy import DataFetcher as ArgoDataFetcher
----> 2 ds = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012']).to_xarray()

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
    270             raise InvalidFetcher(" Initialize an access point (%s) first." %
    271                                  ",".join(self.Fetchers.keys()))
--> 272         xds = self.fetcher.to_xarray(**kwargs)
    273         xds = self.postproccessor(xds)
    274         return xds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self, errors)
    417         if not self.parallel:
    418             if len(self.uri) == 1:
--> 419                 ds = self.fs.open_dataset(self.uri[0])
    420             else:
    421                 ds = self.fs.open_mfdataset(

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/stores/filesystems.py in open_dataset(self, url, *args, **kwargs)
    464         # with self.fs.open(url) as of:
    465         #     ds = xr.open_dataset(of, *args, **kwargs)
--> 466         data = self.fs.cat_file(url)
    467         ds = xr.open_dataset(data, *args, **kwargs)
    468         if "source" not in ds.encoding:

AttributeError: 'HTTPFileSystem' object has no attribute 'cat_file'

I also tried a simpler request:

ds = ArgoDataFetcher().float(6902746).to_xarray()

which returns the same error. I guess it is better to come back to a released version (not the master).

@ctroupin
Copy link
Author

I'm again with version 0.1.6.

ds = ArgoDataFetcher().float(6902746).to_xarray()

now works (as expected).

As you suggested I tried the parallel fetcher:

# Define a box to load (large enough to trigger chunking):
box = [-60, -30, 40.0, 60.0, 0.0, 100.0, "2007-01-01", "2007-04-01"]

# Instantiate a parallel fetcher:
loader_par = ArgoDataFetcher(src='erddap', parallel=True).region(box)

and got a TypeError:

-------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-8183107cd940> in <module>
      3 
      4 # Instantiate a parallel fetcher:
----> 5 loader_par = ArgoDataFetcher(src='erddap', parallel=True).region(box)

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in region(self, box)
    193         """
    194         if 'region' in self.Fetchers:
--> 195             self.fetcher = self.Fetchers['region'](box=box, **self.fetcher_options)
    196             self._AccessPoint = 'region'  # Register the requested access point
    197         else:

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in __init__(self, ds, cache, cachedir, **kwargs)
     87         self.dataset_id = OPTIONS['dataset'] if ds == '' else ds
     88         self.server = api_server
---> 89         self.init(**kwargs)
     90         self._init_erddapy()
     91 

TypeError: init() got an unexpected keyword argument 'parallel'

Last thing I tried: to split my request on shorter periods of time, to avoid the Got more bytes so far... error, and run a loop over all the periods (month for example). That worked for a few months, till I get:

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Any hint is obviously welcome.

@quai20
Copy link
Member

quai20 commented Oct 29, 2020

Hey @ctroupin
I feel like a clean reinstall may be a good idea.

pip uninstall argopy
pip install git+http://github.com/euroargodev/argopy.git@master

There are some versionning weirdness with xarray and fsspec, so I would also advise not to go higher than fsspec 0.8.0 and xarray 0.16.0

Also :

Last thing I tried: to split my request on shorter periods of time, to avoid the Got more bytes so far... error, and run a loop over all the periods (month for example).

That's exactly what's the parallel option is doing for you.

@gmaze gmaze added the bug Something isn't working label Nov 10, 2020
@ctroupin
Copy link
Author

Well, I started from scratch, unfortunately bumped into issue 70 related to erddapy.

Do you recommend working with an older version?

Thanks

@gmaze
Copy link
Member

gmaze commented Nov 16, 2020

Hi @ctroupin
We're reviewing pr #65 which should fix all of these
Once merged, we will issue a new release that should be more stable and allows to work with all recent versions of xarray, erddapy and fsspec !
In the mean, you can checkout here:
https://github.com/euroargodev/argopy/tree/xarray-016/ci/requirements
the version of dependencies with which the new release will work, see files py3.*-dev.yml

@ctroupin
Copy link
Author

Hello @gmaze, I'll try what suggest, or maybe I'll wait for the new release (quite busy time right now) ;)

Thanks!

@gmaze
Copy link
Member

gmaze commented Sep 1, 2021

Closing because seems like the problem is solved with last release or master version

@gmaze gmaze closed this as completed Sep 1, 2021
@ctroupin
Copy link
Author

ctroupin commented Sep 1, 2021

Thanks @gmaze, looking forward to using the latest release, I really love the tool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants