Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MARACOOS ROMS ESPreSSO #31

Closed
johnwilkin opened this issue Mar 15, 2017 · 20 comments
Closed

MARACOOS ROMS ESPreSSO #31

johnwilkin opened this issue Mar 15, 2017 · 20 comments

Comments

@johnwilkin
Copy link

johnwilkin commented Mar 15, 2017

I'm looking on https://data.ioos.us for the MARACOOS Rutgers ROMS ocean forecast (ESPreSSO) that was previously on the NGDC geoportal. Maybe I'm not searching effectively, but I believe we should be there somewhere. The end points at http://tds.marine.rutgers.edu/thredds/roms/espresso/2013_da/catalog.html have not moved in a while.

@rsignell-usgs
Copy link
Member

rsignell-usgs commented Mar 15, 2017

The old NGDC catalog system harvested much of the metadata themselves and used a stand-alone crawler that worked on THREDDS servers without ISO enabled.

The new IOOS catalog relies on the regions to populate folders of metadata, and perhaps MARACOOS is using a crawler that expects the ISO services to be enabled.

I just noticed that the catalog @johnwilkin specified does not have the THREDDS ISO metadata services enabled. For example:
http://tds.marine.rutgers.edu/thredds/catalog/roms/espresso/2013_da/his/catalog.html?dataset=roms/espresso/2013_da/his/ESPRESSO_Real-Time_v2_History_Best
under "Access" lists only OpenDAP, WMS and NetcdfSubset services.

Luckily this is a simple fix. We just ask @rjdave to add an extra line to the relevant catalogs and restart thredds!

<service name="allServices" serviceType="Compound" base="">
        <service name="ncdods" serviceType="OpenDAP" base="/thredds/dodsC/"/>
        <service name="ncss" serviceType="NetcdfSubset" base="/thredds/ncss/grid/"/>
        <service name="wms" serviceType="WMS" base="/thredds/wms/"/>
        <service name="iso" serviceType="ISO" base="/thredds/iso/"/>
</service>

@rjdave
Copy link

rjdave commented Mar 15, 2017

ISO is now enabled on the Espresso Real-Time v2 datasets.

@rsignell-usgs
Copy link
Member

rsignell-usgs commented Mar 15, 2017

@rjdave, nice! Thanks!

@mwengren and @lukecampbell , I searched https://registry.ioos.us/harvests for MARACOOS and found that the catalog is harvesting these two WAFS:
http://tds.maracoos.org/iso/
http://sos.maracoos.org/maracoos-iso/

Hopefully the scripts that are populating http://tds.maracoos.org/iso/ will now be able to extract ISO records from http://tds.marine.rutgers.edu/thredds/roms/espresso/2013_da/catalog.html.

If not, here's a script that uses Axiom's Docker thredds_iso_harvester container to extract only the "Best time series" ISO records:

docker run --rm -v $(pwd)/harvest_espresso.py:/srv/harvest.py -v $(pwd)/iso:/srv/iso \
  axiom/thredds_iso_harvester

where harvest_espresso.py is:

from thredds_iso_harvester.harvest import ThreddsIsoHarvester
from thredds_crawler.crawl import Crawl

skip = Crawl.SKIPS
select = ['.*\_Best']

ThreddsIsoHarvester(catalog_url="http://tds.marine.rutgers.edu/thredds/roms/espr
esso/catalog.xml",
    skip=skip, select=select,
    out_dir="/srv/iso/espresso")

This should produce two ISO records in the subdirectory ./iso/espresso:

$ ls iso/espresso
roms_espresso_2013_da_avg_ESPRESSO_Real-Time_v2_Averages_Best.iso.xml
roms_espresso_2013_da_his_ESPRESSO_Real-Time_v2_History_Best.iso.xml

@rsignell-usgs
Copy link
Member

Still getting nothing here: https://data.ioos.us/dataset?q=espresso
@kknee, can you investigate?

@rsignell-usgs
Copy link
Member

I don't know what happened, but we now have espresso ☕ !
https://data.ioos.us/dataset?q=espresso

2017-03-24_13-32-48

@mwengren
Copy link
Member

I think the MARCOOS WAF scripts were having some issues recently (last couple weeks), but probably the TDS changes combined with the WAF process being fixed caused those forecasts to be added. I looked at the Catalog harvest logs and a couple new datasets were added on the 20th.

@rsignell-usgs
Copy link
Member

rsignell-usgs commented Mar 27, 2017

I'm reopening this issue because I just noticed that the catalog currently contains two Espresso datasets appear to be identical: the history files from the 2013-present aggregation:
https://data.ioos.us/dataset?q=espresso

So we appear to have only 1 dataset being discovered twice instead of the 4 datasets we should have.

I gchatted with @rjdave and he has now enabled ISO services for the pre 2013 datasets in this catalog
http://tds.marine.rutgers.edu/thredds/roms/espresso/2009_da/catalog.html

So with the dockerized thredds_iso_crawler, called here in a bash script I'm calling do_crawl, which is:

#!/bin/bash
# do_crawl harvest_comt2.py
docker run --rm -v $(pwd)/$1:/srv/harvest.py -v $(pwd)/iso:/srv/iso \
  axiom/thredds_iso_harvester

and invoked thusly:

./do_crawl harvest_espresso.py

where harvest_espresso.py is:

from thredds_iso_harvester.harvest import ThreddsIsoHarvester
from thredds_crawler.crawl import Crawl

skip = Crawl.SKIPS
select = ['.*\_Best', '.*2009\_da.*']

ThreddsIsoHarvester(catalog_url="http://tds.marine.rutgers.edu/thredds/roms/espr
esso/catalog.xml",
    skip=skip, select=select,
    out_dir="/srv/iso/espresso")

I now get 4 records!

(IOOS3) rsignell@gamone:~> ls iso/espresso
espresso_2009_da_averages.iso.xml
espresso_2009_da_history.iso.xml
roms_espresso_2013_da_avg_ESPRESSO_Real-Time_v2_Averages_Best.iso.xml
roms_espresso_2013_da_his_ESPRESSO_Real-Time_v2_History_Best.iso.xml

Woohoo!

Can whoever is responsible for the MACOORA WAF please enable?

@kknee
Copy link

kknee commented Mar 27, 2017

@rsignell-usgs the two records referenced are actually slightly different

  • 2013-present History Best (1D time) aggregation
  • 2013-present History FMRC (2D time) aggregation

Now that the 2009-2013 dataset has had ISO support enabled, we added

  • 2009-2013 History Best (1D time) aggregation
    ***Note, the 2009-2013 dataset does not have an FMRC aggregation

We also added:

  • 2013-present Averages Best (1D time) aggregation
  • 2013-present Averages FMRC (2D time) aggregation
  • 2009-2013 Averages Best (1D time) aggregation
    ***Note, the 2009-2013 dataset does not have an FMRC aggregation

@rsignell-usgs
Copy link
Member

rsignell-usgs commented Mar 27, 2017

@kknee , I would argue against providing the 2D time aggregations -- they are not CF-compliant and the WMS services don't work properly. The 1D "Best" from the FRMC aggregation is all that most users expect to see. These are CF-Compliant and match the structure of joinExisting aggregations. So the 2009-2013 and 2013-present datasets would then have the same structure (and the same structure as all the other aggregated modeling data in the IOOS catalog).

@rjdave
Copy link

rjdave commented Mar 28, 2017

I would also recommend against providing the 2D time aggregations. Both for the reasons @rsignell-usgs mentioned and the fact that we don't feel users should rely on the accuracy or validity of the first 24 hours (hindcast) of each run. The first 24 hours of hindcast is more of a spin-up phase, which is why we don't include them in our "Best" aggregation.

@kknee
Copy link

kknee commented Mar 28, 2017

@brianmckenna will update the waf today

@rsignell-usgs
Copy link
Member

@kknee and @brianmckenna, thanks!

@johnwilkin
Copy link
Author

johnwilkin commented Mar 28, 2017 via email

@rsignell-usgs
Copy link
Member

I now see 6 entries for espresso in the IOOS catalog: https://data.ioos.us/dataset?q=espresso

The 2009-2013 averages and history datasets are now there, but unfortunately the title for both is simply "espresso".

@rjdave, can you please modify the titles for these in your catalog NcML to be more similar to the 2013-present datasets, for example, something like:
"ROMS ESPRESSO Real-Time Operational IS4DVAR Forecast System Version 1, 2009-2013 Average"
"ROMS ESPRESSO Real-Time Operational IS4DVAR Forecast System Version 1, 2009-2013 History"

@rsignell-usgs
Copy link
Member

@kknee and @brianmckenna, the Rutgers Espresso model forecast ISO metadata from
http://tds.marine.rutgers.edu/thredds/roms/espresso/catalog.html
needs to be harvested every day with the new forecast info (the stop date), but it doesn't appear that is happening (or it's broken): the ESPRESSO records at http://tds.maracoos.org/iso/ were last updated 27-Mar-2017.

Can you please fix?

We discoverd this because @rjdave updated their THREDDS catalog at 1200 EDT yesterday to improve the titles for the 2009-2013 espresso data, and you can see them in the ISO records here:
http://tds.marine.rutgers.edu/thredds/iso/roms/espresso/2009_da/his?catalog=http%3A%2F%2Ftds.marine.rutgers.edu%2Fthredds%2Froms%2Fespresso%2F2009_da%2Fcatalog.html&dataset=espresso_2009_da_history

but the new titles have not shown up in the IOOS catalog yet
https://data.ioos.us/dataset?q=guid%3A%22edu.rutgers.marine%3Aespresso_2009_da_averages%22
despite the catalog harvesting from http://tds.maracoos.org/iso/
at 1600 EDT yesterday.

@rsignell-usgs
Copy link
Member

The Espresso forecast datasets on http://tds.maracoos.org/iso/ are still stuck on the 27th:
2017-03-31_8-39-41

@kknee and @brianmckenna, these need to be updated daily!

@lukecampbell
Copy link
Member

@rsignell-usgs the bottom two are not static documents and don't need to be updated daily. They are pure HTML anchors that link to the thredds service endpoints directly.
screen_shot_2017-03-31_at_9_21_14_am

@brianmckenna
Copy link

2009 datasets are no longer updating, so are assumed static.

2013 updates continuously, please check the content of the ISO.
At this moment <endPosition> is:

  • 2017-04-02T12:00:00Z for Averages TDS
  • 2017-04-03T12:00:00Z for History TDS

which both reflect the TDS extents. If you are seeing otherwise, please let me know.

If the HTML is being parsed instead of the ISO, please let me know that as well and I'll see what can be done.

Note: IOOS catalog last updated ~13 hours ago, so both still reflect yesterday's extents.

@rsignell-usgs
Copy link
Member

@brianmckenna & @lukecampbell , ah, okay, my bad seeing those file dates for the 2013 data and assuming they were not updating.

Could you please link the 2009 data to the actual ISO records (shown below) as well so that changes made to the metadata in there will appear? (we made some changes to the title and summary, for example)

http://tds.marine.rutgers.edu/thredds/iso/roms/espresso/2009_da/avg?catalog=http%3A%2F%2Ftds.marine.rutgers.edu%2Fthredds%2Froms%2Fespresso%2F2009_da%2Fcatalog.html&dataset=espresso_2009_da_averages

http://tds.marine.rutgers.edu/thredds/iso/roms/espresso/2009_da/his?catalog=http%3A%2F%2Ftds.marine.rutgers.edu%2Fthredds%2Froms%2Fespresso%2F2009_da%2Fcatalog.html&dataset=espresso_2009_da_history

@rsignell-usgs
Copy link
Member

Looks like espresso runs are now in tip-top condition in the catalog:
2017-04-07_7-30-19

Thanks everyone! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants