Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Task: Establish an ATN WAF of ISO19115 metadata records for IOOS Data Catalog to harvest #52

Open
2 tasks
MathewBiddle opened this issue May 1, 2024 · 10 comments
Assignees
Labels
ATN Issues relating to the Animal Telemetry Network enhancement New feature or request

Comments

@MathewBiddle
Copy link
Contributor

Who is requesting this?

@ioos/marine-life

What is being requested?

Connect ATN and MBON into IOOS DMAC. Coordinate with IOOS Catalog developers (POC: @mwengren) on how ATN and/or MBON portals could be harvested for data.ioos.us. Guidance for the process to add records is documented at https://ioos.github.io/catalog/

What is the requested deadline and why?

No response

What is the current status quo (i.e., what happens if this does not get done)?

ATN and MBON datasets wont show up in data.ioos.us.
Marine Life will not meeting IOOS DMAC requirements by being discoverable in data.ioos.us.

What indicates this is done (i.e., how do we know this is complete)?

  • ATN catalog is discoverable in data.ioos.us
  • MBON catalog is discoverable in data.ioos.us

Provide a description or any other important information.

xref:

@mwengren
Copy link
Member

mwengren commented Jun 5, 2024

Copying my comments from ioos/ckanext-ioos-theme#237 (comment) below:

AFAIK IOOS is required to furnish ISO XML metadata (or perhaps DCAT JSON, not 100% sure on that alternative) to NOAA for inclusion in NOAA's enterprise data inventories for all of our publicly-available data/services.

For all of IOOS' non-bio data, it's been fairly straightforward to do this as most of the software we use has been developed to able to output an ISO XML metadata representation of the datasets they serve. Since that isn't the case for OBIS, MBON, or ATN (I believe), that's something we'll need to address for both including those data in IOOS Catalog given its current capabilities, and also for sending up the chain to NOAA to meet requirements.

It may be that leveraging IOOS Catalog and converting the various bio data formats to ISO XML format isn't the best approach to meeting NOAA data inventory requirements. If there are better, simpler ways to furnish these metadata to NOAA that I'm not aware, we should consider those options. Catalog has been our solution to date, but primarily because of the pre-existing metadata format support and compatibility.

Ideally, we can have a comprehensive inventory of 100% of IOOS' data in Catalog, and I think we should still aim for that goal, but we need to understand better what the challenges for that might be wrt ATN, MBON, or other bio/Marine Life data.

@MathewBiddle
Copy link
Contributor Author

Thanks @mwengren.

For ATN, at some point, we hope to add non-embargoed data to an ATN ERDDAP which could be an easy pathway for that observing method. See #44

For MBON, we are encouraging the MBON projects to work with RAs to host the raw data on an RA ERDDAP (or other web service as applicable). Most of the RA ERDDAPs are already being harvested, hence the push for that collaboration. Below is an example:

Another wrinkle in the whole pipeline is that OBIS-USA is being archived at NCEI on a quarterly basis. Part of our guidance is to submit data to OBIS-USA. While that metadata record is not available through the IOOS Catalog, it is available through the various NOAA and higher Catalogs. So, does that meet our NOAA data inventory requirements?? See links below:

The data flow diagram might help illustrate all the nuances https://ioos.github.io/mbon-docs/mbon-data-flow.html

@mwengren
Copy link
Member

@MathewBiddle That makes sense on the data flow and connection in with the RA ERDDAPs, I recall that plan now... thanks for adding the example.

I think the OBIS-USA/NCEI archive probably does meet the NOAA data publishing/open data requirements for those data - at least from what I understand.

I think our goal should be to include both access points (NCEI archive and RA ERDDAP) at the NOAA Catalog level (i.e. OneStop). The IOOS Catalog should include all data access services provided by the RAs, or other IOOS DACs, that are funded and supported by IOOS.

Having two separate metadata records for the same dataset should be OK as well as they'll be describing different endpoints to access the same data, presumably. Ideally there would be a way to relate each metadata record to the other within the NOAA Catalog, but I'm not sure that is technically possible at present. That might be a good requirement to share with the OneStop team though.

I guess the one scenario that seems to be a potential gap where IOOS-funded bio data might not be represented in IOOS Catalog is if a provider is not serving their data via RA ERDDAP, but are aligning them to Darwin Core and submitting to NCEI.

Ideally, we could also represent those raw data access points, whatever they might be, in IOOS Catalog as well, even if they would be technically meeting the NOAA open data publishing guidelines via OBIS/NCEI archive pathway.

I don't know how much of a priority or how common this is... maybe would provide justification to encourage those providers to work with an RA to publish to ERDDAP, however.

@laurabrenskelle
Copy link
Contributor

@mwengren Is there a reason you couldn't share the RA ERDDAP link as another data access link in the collection metadata record at NCEI? It doesn't seem ideal to have two collection records for the same dataset in OneStop. Here is an example: https://data.noaa.gov/onestop/collections/details/573b7dc1-7d06-4fdc-a134-056c112c2260

@MathewBiddle
Copy link
Contributor Author

I guess the one scenario that seems to be a potential gap where IOOS-funded bio data might not be represented in IOOS Catalog is if a provider is not serving their data via RA ERDDAP, but are aligning them to Darwin Core and submitting to NCEI.

I think this might be more common with cross funded efforts, like MBON. Some projects use EDI and Arctic Data Center as their repositories (maybe BCO-DMO too).

@MathewBiddle MathewBiddle changed the title New Task: Connect ATN and MBON into IOOS DMAC New Task: Connect ATN and MBON datasets into IOOS Data Catalog Dec 11, 2024
@MathewBiddle
Copy link
Contributor Author

MathewBiddle commented Dec 11, 2024

This task has evolved a little bit since that last revisit in June. This activity should be initially focused on how to get ATN "data" into the IOOS Data Catalog. I put quotes around "data" because we should define what we mean by that term in the context of ATN.

For now, MBON datasets are making it to the IOOS Data Catalog via RA ERDDAP's (which are being harvested into the catalog). So, there is no effort required to make MBON datasets appear in the IOOS Data Catalog.

Next steps:

edit: data -> metadata (the IOOS Data Catalog only harvests metadata)

@MathewBiddle
Copy link
Contributor Author

Updating the title to be more reflective of the activity.

@MathewBiddle MathewBiddle changed the title New Task: Connect ATN and MBON datasets into IOOS Data Catalog New Task: Connect ATN metadata records into IOOS Data Catalog Dec 18, 2024
@MathewBiddle MathewBiddle removed the MBON Issues relating to the Marine Biodiversity Observation Network label Dec 18, 2024
@MathewBiddle
Copy link
Contributor Author

I'm understanding this task better as we continue to have conversations with @mwengren about the IOOS Data Catalog.

The requirement for the IOOS Data Catalog is:

  • A WAF: A WAF stands for a web-accessible folder. It is any folder with file contents exposed via a webserver to the outside world.
  • Files contained within the WAF should be XML files conforming to the ISO 19115-2 metadata standard.

More details:

The first step to using the Registry is to request an account, details on that can be found on the user accounts page. Once your account is approved and your IOOS organization affiliation confirmed, the next step is to configure the list of WAF and/or CS-W harvest sources that contain metadata describing your organization’s datasets. Detail on how to do this can be found on the managing harvests page.

The Registry provides a high degree of control over managing, troubleshooting (including displaying ISO XML validation and CKAN harvesting errors), and previewing dataset metadata for publishing in the IOOS Catalog.

https://ioos.github.io/catalog/pages/registry/

@MathewBiddle MathewBiddle changed the title New Task: Connect ATN metadata records into IOOS Data Catalog New Task: Establish an ATN WAF of ISO19115 metadata records for IOOS Data Catalog to harvest Jan 31, 2025
@MathewBiddle
Copy link
Contributor Author

clarified the title to what the activity is.

@MathewBiddle MathewBiddle marked this as a duplicate of #115 Feb 3, 2025
@MathewBiddle
Copy link
Contributor Author

related #44

Need to identify if we go

  1. ATN -> ERDDAP -> IOOS Data Catalog
  2. ATN -> WAF -> IOOS Data Catalog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATN Issues relating to the Animal Telemetry Network enhancement New feature or request
Projects
Status: ToDo
Development

No branches or pull requests

4 participants