Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with WMS loading >2GB variables #43

Closed
jbzambon opened this issue Jul 30, 2019 · 13 comments
Closed

Issue with WMS loading >2GB variables #43

jbzambon opened this issue Jul 30, 2019 · 13 comments

Comments

@jbzambon
Copy link

I am attempting to load a large dataset (4200x4100) with 21 timesteps into WMS using THREDDS. When I do so, it fails and the page returns...

<ServiceExceptionReport xmlns="http://www.opengis.net/ogc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.3.0" xsi:schemaLocation="http://www.opengis.net/ogc http://schemas.opengis.net/wms/1.3.0/exceptions_1_3_0.xsd">
<ServiceException>
Unexpected error of type java.lang.IllegalStateException
</ServiceException>
<StackTrace>
<![CDATA[
uk.ac.rdg.resc.edal.cdm.LookUpTable.<init>(LookUpTable.java:109)uk.ac.rdg.resc.edal.cdm.LookUpTableGrid.generate(LookUpTableGrid.java:93)uk.ac.rdg.resc.edal.cdm.CdmUtils.createHorizontalGrid(CdmUtils.java:279)uk.ac.rdg.resc.edal.cdm.CdmUtils.readCoverageMetadata(CdmUtils.java:174)uk.ac.rdg.resc.edal.cdm.CdmUtils.readCoverageMetadata(CdmUtils.java:127)thredds.server.wms.ThreddsDataset.<init>(ThreddsDataset.java:95)thredds.server.wms.ThreddsDataset.getThreddsDatasetForRequest(ThreddsDataset.java:270)thredds.server.wms.ThreddsWmsController.dispatchWmsRequest(ThreddsWmsController.java:165)uk.ac.rdg.resc.ncwms.controller.AbstractWmsController.handleRequestInternal(AbstractWmsController.java:207)org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:174)org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:50)org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)javax.servlet.http.HttpServlet.service(HttpServlet.java:634)org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)javax.servlet.http.HttpServlet.service(HttpServlet.java:741)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.servlet.filter.RequestQueryFilter.doFilter(RequestQueryFilter.java:118)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.servlet.filter.RequestCORSFilter.doFilterInternal(RequestCORSFilter.java:49)org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:347)org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:263)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.servlet.filter.RequestPathFilter.doFilter(RequestPathFilter.java:94)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.server.RequestBracketingLogMessageFilter.doFilter(RequestBracketingLogMessageFilter.java:81)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:71)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:137)org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:660)org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:798)org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:808)org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)java.lang.Thread.run(Thread.java:748)
]]>
</StackTrace>
</ServiceExceptionReport>

and threddsServlet.log returns...

2019-07-30T01:56:09.069 +0000 [ 30415][ 5] ERROR - thredds.server.wms.ThreddsWmsController - dispatchWmsRequest(): Exception: java.lang.IllegalStateException: nLon (=0) and nLat (=2147483647) must be positive and > 0

I notice that 2147483647 is 2^31 (signed 32-bit number). Does WMS handle arrays as 32-bit in some locations that's causing us to run into this limit? Note that OPeNDAP handles this dataset without issue and if I subset the dataset with a stride of 4 (1050x1025), WMS is also able to run without issue.

I have put the datasets on AWS for inspection.
OPeNDAP endpoint... http://54.158.195.139:8080/thredds/dodsC/nwa/v1/NWA_v1_best.ncd.html

WMS endpoint...
http://54.158.195.139:8080/thredds/wms/nwa/v1/NWA_v1_best.ncd?service=WMS&version=1.3.0&request=GetCapabilities

Thanks!

-Joe

@lesserwhirls
Copy link
Collaborator

It looks like there is an issue somewhere deep down in the ncWMS code when it constructs a lookup table for quick mapping lat/lon values to array indexes. This particular version of ncWMS isn't maintained any more. However, in the next version of the TDS (v5.0.0), we are using the latest version from edal-java, which might not have this limitation. Would you mind trying the latest 5.0.0 beta (here) and see if you still get this error? If so, we can open an issue on the edal-java repo and see if it's something that is fixable.

@jbzambon
Copy link
Author

Thanks for your quick reply! Yes, upgrading TDS (v5.0.0) was the idea that @rsignell-usgs had offered in a side chat. I was wary about using beta software but will be happy to give it a try and report back. Thanks again!

@lesserwhirls
Copy link
Collaborator

lesserwhirls commented Jul 30, 2019

I can totally understand being wary. For WMS, there are a few things that are not working at this point. First, customization of the WMS via wmsConfig.xml, and second, obtaining images from the Godiva interface using the "Export to PNG" link. I hope to get the first knocked out in the next week or two; the second is a bug in the Google Web Toolkit code used to generate Godiva, and I'll need the Reading folks to check that one out.

@rsignell-usgs
Copy link
Contributor

@jbzambon, note you can also test on ncwms2 by running a standalone docker container if you prefer:
https://hub.docker.com/r/axiom/ncwms

@jbzambon
Copy link
Author

jbzambon commented Jul 30, 2019

I was able to get a bare THREDDS v.5.0.0 beta installed on a fresh AWS instance. Unfortunately, I'm getting an error loading WMS, although not the same error. The page that loads makes no reference to 2^31...

<ServiceExceptionReport xmlns="http://www.opengis.net/ogc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.3.0" xsi:schemaLocation="http://www.opengis.net/ogc http://schemas.opengis.net/wms/1.3.0/exceptions_1_3_0.xsd">
<ServiceException>
Problem creating dataset nwa/v1/NWA_v1_best.ncd at fmrc:NWA_v1 Cause: nLon (=0) and nLat (=0) must be positive and > 0
</ServiceException>
</ServiceExceptionReport>

Interestingly, threddsServlet.log doesn't show anything.

Apache's log (localhost_access_log.2019-07-30.txt ) shows
152.1.31.163 - - [30/Jul/2019:15:09:08 +0000] "GET /thredds/wms/nwa/v1/NWA_v1_best.ncd?service=WMS&version=1.3.0&request=GetCapabilities HTTP/1.1" 400 530

OPeNDAP seems to work, again. Endpoint: http://54.81.33.213:8080/thredds/dodsC/nwa/v1/NWA_v1_best.ncd.html

WMS Endpoint: http://54.81.33.213:8080/thredds/wms/nwa/v1/NWA_v1_best.ncd?service=WMS&version=1.3.0&request=GetCapabilities

(This is the same exact data file that was loaded on the v.4 THREDDS instance.)

Any more ideas? Thanks again!

@lesserwhirls
Copy link
Collaborator

So I think the failing WMS response is correct and very close to the one in 4.6, although not super helpful on its own. If I look at the OPeNDAP output for lat and lon (with a stride of 20 for the sake of response size):

http://54.81.33.213:8080/thredds/dodsC/nwa/v1/NWA_v1_best.ncd.ascii?lon%5B0:20:4199%5D%5B0:20:4099%5D,lat%5B0:20:4199%5D%5B0:20:4099%5D

both have zero for all of their values. In this case, WMS is wigging out because there are effectively no latitudes and longitudes, so it cannot create a lookup table between index space and lat/lon space. Because this collection is setup as an FMRC and only exposing the best dataset, it's hard to figure out where things are going wrong from this end (I'd really need to see the FMRC config xml and some of the dataset granules).

@jbzambon
Copy link
Author

jbzambon commented Aug 1, 2019

Here's some things that might help (?) Please let me know if there is anything else I can send. Thanks again!

fmrc.log...

[2019-07-30T15:07:06.400+0000] INFO  thredds.featurecollection.InvDatasetFeatureCollection: FeatureCollection added = FeatureCollectionConfig name ='NWA v1' collectionN
ame='NWA_v1' type='FMRC'
  spec='/thredds_data/nwa_v1/nwa_.*\.nc$'
  timePartition =directory
  updateConfig =UpdateConfig{userDefined=true, recheckAfter='null', rescan='0 5 * * * ? *', triggerOk=true, updateType=test}
  tdmConfig =UpdateConfig{userDefined=false, recheckAfter='null', rescan='null', triggerOk=true, updateType=test}
  ProtoConfig{choice=Penultimate, change='null', param='null', outerNcml='null', cacheAll=true}
  hasInnerNcml =true
  fmrcConfig =FmrcConfig: regularize=false datasetTypes=[Best]

[2019-07-30T15:07:08.562+0000] INFO  ucar.nc2.ft.fmrc.FmrcDataset: FMRCDataset build Proto cache >10M var= mask(lon=4200, lat=4100)
[2019-07-30T15:07:08.563+0000] INFO  ucar.nc2.ft.fmrc.FmrcDataset: FMRCDataset build Proto cache >10M var= lon(lon=4200, lat=4100)
[2019-07-30T15:07:08.563+0000] INFO  ucar.nc2.ft.fmrc.FmrcDataset: FMRCDataset build Proto cache >10M var= lat(lon=4200, lat=4100)

catalog.xml...

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="THREDDS Server Default Catalog : You must change this to fit your server!"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
           http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.6.xsd">

  <service name="all" base="" serviceType="compound">
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="dap4" serviceType="DAP4" base="/thredds/dap4/" />
    <service name="http" serviceType="HTTPServer" base="/thredds/fileServer/" />
    <!--service name="wcs" serviceType="WCS" base="/thredds/wcs/" /-->
    <service name="wms" serviceType="WMS" base="/thredds/wms/" />
    <!--service name="ncss" serviceType="NetcdfSubset" base="/thredds/ncss/" /-->
  </service>

  <service name="dap" base="" serviceType="compound">
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="dap4" serviceType="DAP4" base="/thredds/dap4/" />
  </service>

<dataset name="Northwest Atlantic (US East Coast, Gulf of Mexico and Caribbean Sea)">
    <metadata inherited="true">
        <serviceName>all</serviceName>
        <dataType>Grid</dataType>
        <dataFormat>netCDF</dataFormat>
    </metadata>
    <featureCollection name="NWA v1" featureType="FMRC" harvest="true" path="nwa/v1">
        <collection spec="/thredds_data/nwa_v1/nwa_.*\.nc$" />
        <update startup="true" rescan="0 5 * * * ? *" trigger="allow" />
        <fmrcConfig regularize="false" datasetTypes="Best" />
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="true" />
    </featureCollection>
</dataset>

</catalog>

@lesserwhirls
Copy link
Collaborator

What do each of the individual netCDF files look like? How are they laid out on disk? If you update the <fmrcConfig> datasetTypes attribute to include Files, I should be able to see the individual granules remotely.

@jbzambon
Copy link
Author

jbzambon commented Aug 1, 2019

Replaced "Best" with "Files" in <fmrcConfig> datasetTypes...

$ diff catalog.xml catalog.xml_orig 
32c32
<         <fmrcConfig regularize="false" datasetTypes="Files" />
---
>         <fmrcConfig regularize="false" datasetTypes="Best" />

Currently, there is only one file in the directory that's being sampled. Here's the ncdump of that file...

ncdump -h /thredds_data/nwa_v1/nwa_20190729.nc 
netcdf nwa_20190729 {
dimensions:
	time = UNLIMITED ; // (21 currently)
	lon = 4200 ;
	lat = 4100 ;
variables:
	float lon(lon, lat) ;
		lon:long_name = "longitude" ;
		lon:units = "degree_east" ;
	float lat(lon, lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degree_north" ;
	float mask(lon, lat) ;
		mask:long_name = "mask" ;
		mask:flag_values = 0., 1. ;
		mask:flag_meanings = "land water" ;
		mask:coordinates = "lon lat" ;
	float time(time) ;
		time:long_name = "time since initialization" ;
		time:units = "seconds since 2000-01-01 00:00:00" ;
		time:calendar = "gregorian" ;
		time:field = "time, scalar, series" ;
	float sst(time, lon, lat) ;
		sst:long_name = "sea_surface_temperature" ;
		sst:units = "Kelvin" ;
		sst:time = "time" ;
		sst:coordinates = "lon lat time" ;
		sst:field = "temperature, scalar, series" ;
		sst:_FillValue = 1.e+37f ;
	float chl(time, lon, lat) ;
		chl:long_name = "mass_concentration_of_chlorophyll_in_sea_water" ;
		chl:units = "kg m-3" ;
		chl:time = "time" ;
		chl:coordinates = "lon lat time" ;
		chl:field = "chl, scalar, series" ;
		chl:_FillValue = 1.e+37f ;
	float gsst(time, lon, lat) ;
		gsst:long_name = "gradient sea_surface_temperature" ;
		gsst:units = "Kelvin meter-1" ;
		gsst:time = "time" ;
		gsst:coordinates = "lon lat time" ;
		gsst:field = "temperature, scalar, series" ;
		gsst:_FillValue = 1.e+37f ;
	float gchl(time, lon, lat) ;
		gchl:long_name = "gradient mass_concentration_of_chlorophyll_in_sea_water" ;
		gchl:units = "kg m-4" ;
		gchl:time = "time" ;
		gchl:coordinates = "lon lat time" ;
		gchl:field = "chl, scalar, series" ;
		gchl:_FillValue = 1.e+37f ;
	float u(time, lon, lat) ;
		u:long_name = "eastward_sea_water_velocity" ;
		u:units = "meter second-1" ;
		u:time = "time" ;
		u:coordinates = "lon lat time" ;
		u:field = "u, scalar, series" ;
		u:_FillValue = 1.e+37f ;
	float v(time, lon, lat) ;
		v:long_name = "northward_sea_water_velocity" ;
		v:units = "meter second-1" ;
		v:time = "time" ;
		v:coordinates = "lon lat time" ;
		v:field = "v, scalar, series" ;
		v:_FillValue = 1.e+37f ;

// global attributes:
		:Conventions = "CF-1.0" ;
		:title = "NWA Domain" ;
		:_CoordinateModelRunDate = "2019-07-29T00:00:00Z" ;
}

Thanks again!

@rsignell-usgs
Copy link
Contributor

I wonder whether @jonblower has ideas here...

@lesserwhirls
Copy link
Collaborator

Looks like a good old fashion bug, which appears to have been introduced while getting rid of a one line if statement.

cleanup fail

Specifically here:

if (orgV.getSize() > 10 * 1000 * 1000) {
logger.info("FMRCDataset build Proto cache >10M var= "+orgV.getNameAndDimensions());
} else {
v.setCachedData(orgV.read()); // read from original - store in proto
}

A one line if statement to control a log message turned into a full if / else statement by accident. The net effect is that the data from the lat/lon variables, because they contain more than 10 million points, are not being cached (thus all values were zero in the virtual FMRC dataset).

That's an easy fix, and I can backport it to v4.6 as well.

@lesserwhirls
Copy link
Collaborator

Transferring issue to https://github.com/Unidata/netcdf-java

@lesserwhirls lesserwhirls transferred this issue from Unidata/thredds Aug 2, 2019
lesserwhirls added a commit to lesserwhirls/netcdf-java that referenced this issue Aug 2, 2019
lesserwhirls added a commit to lesserwhirls/thredds that referenced this issue Aug 2, 2019
This was referenced Aug 2, 2019
lesserwhirls added a commit to lesserwhirls/thredds that referenced this issue Aug 2, 2019
@jbzambon
Copy link
Author

jbzambon commented Aug 3, 2019

Seems to be working great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants