-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading Dataset from memory #406
Comments
Greetings! We are working on this on the C side at Unidata, and hopefully it will be Cheers! Sean On Tuesday, May 5, 2015, Maxim Novikov notifications@github.com wrote:
|
To be clear, though, this would involve passing the entire "string" of data, not a file-like object. |
you can create a Dataset in memory, using 'diskless=True'. |
Create, yes, but read - no. Let's say you use the NetcdfSubset service from the THREDDS Data Server. Sean On Tue, May 5, 2015 at 1:31 PM, Jeff Whitaker notifications@github.com
|
you could copy the data directly to a diskless file (without first writing to disk) couldn't you? |
Define "diskless file". NetCDF requires a filename to read data from. |
from netCDF4 import Dataset then you have a 'diskless' or in-memory version of the dataset at URL |
Is the from netCDF4 import Dataset
from urllib.requests import urlopen
url = urlopen(URL)
ncdata = url.read()
nc = Dataset(ncdata, diskless=True, mode='r') |
Passing string seems like fine idea, in my case of usage this will be convenient enough (I read 10mb NetCDF files which is not problem to store in python memory). PS. I dont really know binary structure of NetCDF4 format, event if it cannot be read/written incrementally, handling buffer consumption on C side of python extension can be good for large files. |
Well right now, the C netcdf library only takes a filename or an opendap URL; there's not even the option of taking any kind of file pointer. Even if the latter was possible, you still wouldn't be able to turn a What they're adding to the netCDF C library is an API to point to an existing in-memory buffer and eliminate all file I/O; HDF5 already has such an API. It would be possible to add a Python API to netcdf4-python to take a file-like object, but at some level here all of the data needs to be read into a buffer, with a single pointer to be handed to the C-library. This is likely not to actually be a |
There's been some discussion about this over on the h5py issue tracker (h5py/h5py#552). It sounds like some changes to the HDF5 libraries may be necessary to make this worth entirely smoothly. In the meantime, if you're working with netCDF3 files, using a file like object is already possible with the |
Thank you. As I read NetCDF4 files, for now I settle with NamedTemporaryFile workaround. |
looked into using a local OpenDAP server but couldn't find anything that easily worked serving local netCDF files. This would be an option if anyone can get it to work. |
argh, did the work to create #652, however found: Unidata/netcdf-c#394 :( |
btw, I'm guessing this is a dup of #295 |
btw, there may be another bug in netcdf-c with in-memory files, I just tried with a 2d array of data, and all rows after 100 returned garbage data. Investigating this now |
sorry for the cross post but this isn't working for me as per the docs. I've tried python 2.7 and 3.7 and get the same error
code:
netCDF4 version '1.5.0' a |
@tam203 - this should work. I can only think of two reasons it might not.
Can you tell us what version of netcdf-c you have, and post that file somewhere? (if it's small enough you can tar/zip it and attach it to this ticket). |
Could be related to Unidata/netcdf-c#394 which I believe was fixed in netcdf-c 4.5.0 |
Thanks. I'm using what came from I think you are correct in the bug it looks like I'm on version
How do I go about getting version 4.5 and will the pip version be updated shortly? I'm packaging this up on a AWS EC2 machine to use in lambda so I need the C library to be packaged with the python not just installed somewhere on the system, if that makes sense. |
Ah - I see the linux and osx wheels are built using 4.4.1.1. I will update that and create a new release (1.5.0.1) with new binary wheels. If you have a newer version of the library on your system you can follow the build instructions in the docs to rebuild from source and link against the newer library. |
wheels for 1.5.0.1 are available (using netcdf-c 4.6.3). Please let me know if this fixes the problem. |
@jswhit Perfect that's fixed it thanks. For any one's reference:
Is what I ran to ensure I got the new version. Ta. |
Solved my problem |
Didn't solve my problem, unfortunately. Installing collected packages: numpy, cftime, netcdf4 Any other ideas? |
@kmfweb If you're installing using pip, that means you're using your systems version of netcdf-c (libnetcdf). What version of that is installed? |
@dopplershift I have checked using the command <> ncdump <> or <> nc-config --version <> which gives me the last line of output: netcdf library version 4.4.1.1 of Jun 8 2018 03:08:32 I have some old netCDF data which could, and still can be read. But with the new data I would like to read in and then to regard, I receive the "FileNotFoundError: [Errno 2] No such file or directory: b". Path file and name are correct, and I am able to access to file as well via ncview. |
I'm confused. Is this data you have in a file on disk or data that's already in a buffer in memory? Can you provide sample code for what's not working? |
I have been reading in data of decades, e.g. file19701979.nc, file19801989.nc, file19901999.nc etc. using a loop. Within this loop I have a function "New_Data,Latitudes,Longitudes = GetGrid4Slice(FileName,ReadInfo,SliceInfo,LatInfo,LonInfo)" which includes "ncf=netcdf.netcdf_file(FileName,'r')". For the couple of decadal netCDF files it runs through without any problems. I have got rid of the decades loop, as now I am working with only 1 single netCDF file. For this netCDF file I receive the FileNotFoundError when calling "ncf=netcdf.netcdf_file(FileName,'r')". I am quite sure that my single netCDF file I am trying to read is a proper netCDF file as I am able to have a look using ncview. I am not sure if this is about my loop, or the library's version. This file name and path is def the right one, and the Error is with the "b". |
Ok, then I think you should open a new issue. This issue is about reading datasets from a buffer that already exists in memory, not a file on disk. |
I installed the most recent version of netcdf4 and am getting the error "OSError: [Errno -128] NetCDF: Attempt to use feature that was not turned on when netCDF was built."
|
@cpaton8 That error message says:
I'm not sure how you installed the netcdf-c package (libnetcdf.so or libnetcdf.dylib), but that message means it did not have the memory-based reading enabled when it was compiled. |
@dopplershift all packages installed via conda-forge. libnetcdf 4.7.3 and netcdf 1.4.3 |
@cpaton8 A couple things:
So, on macOS, in this environment:
this code works fine: import requests
import netCDF4
link = ('https://thredds.ucar.edu/thredds/fileServer/satellite/goes/east/grb/ABI/Mesoscale-2/Channel08/'
'current/OR_ABI-L1b-RadM2-M6C08_G16_s20201221854495_e20201221854553_c20201221855015.nc')
with requests.get(link) as resp:
netcdf_file = netCDF4.Dataset('in-mem-file', mode='r', memory=resp.content)
print(netcdf_file.title) |
@cpaton8 I am having a similar problem with netCDF version 4.6.0. When I run your above example: import requests
import netCDF4
link = ('https://thredds.ucar.edu/thredds/fileServer/satellite/goes/east/grb/ABI/Mesoscale-2/Channel08/'
'current/OR_ABI-L1b-RadM2-M6C08_G16_s20201221854495_e20201221854553_c20201221855015.nc')
with requests.get(link) as resp:
netcdf_file = netCDF4.Dataset('in-mem-file', mode='r', memory=resp.content)
print(netcdf_file.title) I get the error:
My python 3.6.9 environment looks like this:
Thoughts? |
@nicksilver I find it really useful in cases like this to look at what's being returned by requests. If I take the code that's failing for you and print out the response: import requests
import netCDF4
link = ('https://thredds.ucar.edu/thredds/fileServer/satellite/goes/east/grb/ABI/Mesoscale-2/Channel08/'
'current/OR_ABI-L1b-RadM2-M6C08_G16_s20201221854495_e20201221854553_c20201221855015.nc')
with requests.get(link) as resp:
print(resp.content.decode('utf-8')) I see:
So the original data file has aged off the server. If I update to a currently available file: import requests
import netCDF4
link = ('https://thredds.ucar.edu/thredds/fileServer/satellite/goes/east/grb/ABI/Mesoscale-2/Channel08/'
'current/OR_ABI-L1b-RadM2-M6C08_G16_s20202261740546_e20202261741003_c20202261741040.nc')
with requests.get(link) as resp:
netcdf_file = netCDF4.Dataset('in-mem-file', mode='r', memory=resp.content)
print(netcdf_file.title) I get |
@nicksilver not sure if this is the issue you are running into but the Modis files we've been working with are HDF-EOS v2 which are based on HDF4. They would need to be converted (there's a tool called h4toh5) before they’re compatible with netCDF-4. |
Beautiful...thank you! |
Can netcdf4 read Dataset from memory?
Something like this:
It would be quite convenient and useful API.
As for now to read netCDF4 Dataset from URL I need explicitly save it as named temporary file.
The text was updated successfully, but these errors were encountered: