Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several error cases when reading from opendap #1136

Closed
4 of 12 tasks
vegardb opened this issue Sep 6, 2018 · 14 comments
Closed
4 of 12 tasks

Several error cases when reading from opendap #1136

vegardb opened this issue Sep 6, 2018 · 14 comments

Comments

@vegardb
Copy link

vegardb commented Sep 6, 2018

Environment Information

  • What platform are you using? (please provide specific distribution/version in summary)
    • Linux
    • Windows
    • OSX
    • Other
    • NA
  • 32 and/or 64 bit?
    • 32-bit
    • 64-bit
  • What build system are you using?
    • autotools (configure)
    • cmake
  • Can you provide a sample netCDF file or C code to recreate the issue?
    • Yes (please attach to this issue, thank you!)
    • No
    • Not at this time

Summary of Issue

When reading large amounts of data I randomly get one out of two errors:

  • DAP failure
  • No failure, but grid only contains zeros.

I have tested with both versions 4.6.1 (which fails) and 4.4, which does not fail.

With my test dataset, failures always occur after 10-20 seconds. When running version 4.4 I get a seemingly correct result after around 10 minutes (I am on a slow internet connection). Fast-loading grids seem not to cause this error.

Steps to reproduce the behavior

Compile and run the code below, like this:

$ export LD_LIBRARY_PATH=<path to netcd 4.6 or 4.4 libdir>
$ export PKG_CONFIG_PATH=$LD_LIBRARY_PATH/pkgconfig
$ gcc test.c -otest $(pkg-config --cflags --libs netcdf) 
$ ./test http://thredds.met.no/thredds/dodsC/meps25files/t2myr_kf_0_5km_latest.nc  air_temperature_2m

You should either get "Data contains only zeros" or "DAP failure".

Retry with 4.4 and get a correct result.

I have tested with several datasets. If you do too, please note that the source code below is hardcoded to read float values only.

#include <stdlib.h>
#include <stdio.h>
#include <netcdf.h>

void check(int result) {
    if (result != NC_NOERR) {
        printf("%d\t%s\n", result, nc_strerror(result));
        exit(1);
    }
}

void read_data(const char* file, const char* variable)
{
    int ds;
    int var;
    int nDims;
    int * dims;

    check(nc_open(file, NC_NOWRITE, &ds));
    check(nc_inq_varid(ds, variable, &var));
    check(nc_inq_varndims(ds, var, &nDims));
    dims = (int*) malloc(sizeof(int) * nDims);
    check(nc_inq_vardimid(ds, var, dims));

    size_t size = 1;
    for (int i = 0; i < nDims; ++i)
    {
        size_t len;
        check(nc_inq_dimlen(ds, dims[i], &len));
        size *= len;
    }

    printf("size: %lu\n", size);
    float* data = (float*) malloc(sizeof(float) * size);
    check(nc_get_var_float(ds, var, data));

    for (size_t i = 0; i < size; ++i)
    {
        if (data[i] != 0) 
        {
            for (int j = 0; j < 20; ++j)
                printf("%d:\t%f\n", j, data[j]);
            return;
        }
    }
    puts("Data contains only zeros");
}

int main(int argc, char ** argv) {
    const char * file = argv[1];
    const char * variable = argv[2];
    read_data(file, variable);
}
@DennisHeimbigner
Copy link
Collaborator

There is a known problem with the dap code not reporting lost connections.
I have a fix, but it is not yet available. As a stopgap, you can try the following.
In the file libdispatch/dauth.c, about line 33, it should currently say:
"HTTP.TIMEOUT","10", /*seconds */
Try changing the 10 to some much larger value, for example:
"HTTP.TIMEOUT","3600", /*seconds */
and see if that helps; note that your request may take a much longer time
to complete, but the result should be correct.

@DennisHeimbigner
Copy link
Collaborator

I will assume the suggestion solved the problem.

@gauteh
Copy link

gauteh commented Jan 9, 2020

I tried setting HTTP.TIMEOUT in ~/.dodsrc:

HTTP.TIMEOUT=1800

but ncdump (version 4.6.0) does not seem to care about this. Should this not also fix the issue?

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Jan 9, 2020

The timeout is utilized deep inside the netcdf-c library, so it is probable
that ncdump is not involved.
More likely is that the .dodsrc timeout line is being
ignored for some reason. Let me do some checking.

@DennisHeimbigner
Copy link
Collaborator

Ok, I did a check and that .dodsrc value is being picked up
and applied to the curl connection via CURL_TIMEOUT.
What makes you think it is not working?

@gauteh
Copy link

gauteh commented Jan 9, 2020 via email

@gauteh
Copy link

gauteh commented Jan 9, 2020 via email

@DennisHeimbigner
Copy link
Collaborator

Did you spell it as Http.verbose or HTTP.VERBOSE
In any case, this just should set the CURLOPT_VERBOSE
flag to 1. Is stderr being redirected by any chance?

@gauteh
Copy link

gauteh commented Jan 9, 2020 via email

@DennisHeimbigner
Copy link
Collaborator

It is working for me. For example this command:
./ncdump -h 'http://thredds.met.no/thredds/dodsC/meps25files/t2myr_kf_0_5km_latest.nc'

Produces this output:

  • STATE: INIT => CONNECT handle 0x800cd240; line 1491 (connection #-5000)
  • Added connection 0. The cache now contains 1 members
  • STATE: CONNECT => WAITRESOLVE handle 0x800cd240; line 1532 (connection #0)
  • Trying 157.249.177.224:80...
  • TCP_NODELAY set
  • STATE: WAITRESOLVE => WAITCONNECT handle 0x800cd240; line 1611 (connection #0)
  • Connected to thredds.met.no (157.249.177.224) port 80 (#0)
  • STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x800cd240; line 1667 (connection #0)
  • Marked for [keep alive]: HTTP default
  • STATE: SENDPROTOCONNECT => DO handle 0x800cd240; line 1685 (connection #0)

GET /thredds/dodsC/meps25files/t2myr_kf_0_5km_latest.nc.dds HTTP/1.1
Host: thredds.met.no
User-Agent: oc4.7.4-development
Accept: /

  • STATE: DO => DO_DONE handle 0x800cd240; line 1756 (connection #0)
  • STATE: DO_DONE => PERFORM handle 0x800cd240; line 1877 (connection #0)
  • Mark bundle as not supporting multiuse
  • HTTP 1.1 or later with persistent connection
    < HTTP/1.1 404
    < Server: nginx
    < Date: Thu, 09 Jan 2020 18:33:12 GMT
    < Content-Type: text/plain; charset=utf8
    < Transfer-Encoding: chunked
    < Connection: keep-alive
    < Access-Control-Allow-Origin: *
    < Content-Description: dods-error
    <
  • STATE: PERFORM => DONE handle 0x800cd240; line 2067 (connection #0)
  • multi_done
  • Connection #0 to host thredds.met.no left intact
  • Expire cleared (transfer 0x800cd240)
    ./ncdump: http://thredds.met.no/thredds/dodsC/meps25files/t2myr_kf_0_5km_latest.nc: http://thredds.met.no/thredds/dodsC/meps25files/t2myr_kf_0_5km_latest.nc: NetCDF: file not found
    BlackBird:ncdump:

@gauteh
Copy link

gauteh commented Jan 9, 2020

Is this with ncdump version 4.6.0? But, the main issue is the timeout / zeroes returned. This was mostly for debugging the issue. It is especially worrisome that zeroes are returned without any error. This could remain an error for even though the timeout is increased. It is not entirely unlikely that the timeout will be reached with large datasets.

Note that this happens even if the DAP server is streaming data, it is just that the request is not completed. ncdump then breaks the connection. I see the same behaviour with ncks.

Ubuntu bug for upgrading: https://bugs.launchpad.net/ubuntu/+source/netcdf/+bug/1859070

@DennisHeimbigner
Copy link
Collaborator

But the underlying problem of timeouts is still there, correct?

@gauteh
Copy link

gauteh commented Dec 3, 2021

Yes, I believe so. If the timeout cannot be avoided I think it would be useful with an error.

@DennisHeimbigner
Copy link
Collaborator

I agree; This will be primarily determined by what libcurl can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants