Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ARGO templates #16

Merged
merged 4 commits into from
Nov 19, 2024
Merged

Create ARGO templates #16

merged 4 commits into from
Nov 19, 2024

Conversation

jmckenna
Copy link
Collaborator

@jmckenna jmckenna commented Apr 19, 2024

  • templates generated through the meeting:

  • other notes from meeting:

    • use additionalProperty for other fields so that they are visible
    • include date when the data was updated
    • for spatialCoverage type, instead of polygon you can alternatively use box (bounding box coordinates)
    • be sure to validate through: https://validator.schema.org/
    • see other templates for type dataset
  • source metadata snippet for a Float:

      [
        {
          "_id": "4902112_m0",
          "data_type": "oceanicProfile",
          "data_center": "AO",
          "instrument": "profiling_float",
          "pi_name": [
            "BRECK OWENS",
            " STEVEN JAYNE",
            " P.E. ROBBINS"
          ],
          "platform": "4902112",
          "platform_type": "S2A",
          "fleetmonitoring": "https://fleetmonitoring.euro-argo.eu/float/4902112",
          "oceanops": "https://www.ocean-ops.org/board/wa/Platform?ref=4902112",
          "positioning_system": "GPS",
          "wmo_inst_type": "854"
        }
    ]

related to iodepo/odis-arch#404

@jmckenna jmckenna changed the title add ARGO templates Create ARGO templates Apr 19, 2024
@jmckenna jmckenna requested a review from pbuttigieg April 19, 2024 19:02
Copy link
Collaborator

@pbuttigieg pbuttigieg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are notes at this stage, and incomplete. Mark as draft until we have a complete and valid example.

@jmckenna jmckenna marked this pull request as draft April 23, 2024 12:57
@jmckenna
Copy link
Collaborator Author

These are notes at this stage, and incomplete. Mark as draft until we have a complete and valid example.

thanks, changed to Draft.

@bkatiemills
Copy link

bkatiemills commented Apr 29, 2024

Can you please identify the mandatory fields in https://github.com/iodepo/odis-in/blob/master/dataGraphs/thematics/dataset/graphs/datasetTemplate.json? Many of them don't make sense for Argovis, I need a clearer picture of what's required versus optional in order to produce an MVP.

@bkatiemills
Copy link

Some questions on specific properties in the dataset template:

  • url: we are an API-driven search service over large datasets; we do not provide links to blobs of entire datasets. The suggestion provided in meeting of linking to our visualization frontend for Argo is inappropriate, since we will be generating jsonld for every dataset we index, and not all of them appear in the frontend. What do other similar API driven services you index do?
  • keywords: please explain how these are used. Is there an existing ontology?
  • distribution: more discussion needed. I am not really keen on the suggestion of listing API calls to each week of Argo data (or any other way of chunking the data); we absolutely do not want people to think of Argovis as a platform to march through and download the entire dataset, which is exactly what this enumeration will encourage. So similar question to url: do you have examples from other API services, what would you recommend?
  • spatialCoverage: is there a jsonld for "everywhere between these two latitudes"?

Thanks for your help on this, I think the biggest sticking point is that we're more of a data service layer than a static blob of data, which makes some of these keys an awkward fit. Once we can resolve that in a way that respects and represents Argovis' intended usage, the rest will be pretty easy.

@jmckenna
Copy link
Collaborator Author

jmckenna commented Apr 29, 2024

@bkatiemills Just to confirm, in your messages here you are pointing to the general dataset template, but the ARGO dataset template that we created together lives at datasetTemplate-ARGO.json

@jmckenna
Copy link
Collaborator Author

@pbuttigieg pinging you here so that you notice @bkatiemills questions above

@pbuttigieg
Copy link
Collaborator

pbuttigieg commented Apr 29, 2024

@jmckenna thanks for the ping

@bkatiemills

  • url: we are an API-driven search service over large datasets; we do not provide links to blobs of entire datasets. The suggestion provided in meeting of linking to our visualization frontend for Argo is inappropriate, since we will be generating jsonld for every dataset we index, and not all of them appear in the frontend. What do other similar API driven services you index do?

the 'url' property is intended for something like a landing page for the dataset or any Web resource that's dedicated to that dataset.

if you don't produce these, you can omit this property

the suggestion in the meeting was more for a Service type (rather than Dataset): there, the url would point to the service's landing page

  • keywords: please explain how these are used. Is there an existing ontology?

You can use any semantic resource you think is appropriate, or just strings.

there's some documentation here, but there are a few updates pending, summarised below.

"keywords": [ 
  "string",
  {
   "@type": "DefinedTerm",
   "inDefinedTermSet": "http://purl.org/dc/terms/DCMIType",
   "termCode": "Image",
   "name": "Image",
   "identifier": "http://purl.org/dc/dcmitype/Image"
}
]
  • distribution: more discussion needed. I am not really keen on the suggestion of listing API calls to each week of Argo data (or any other way of chunking the data); we absolutely do not want people to think of Argovis as a platform to march through and download the entire dataset, which is exactly what this enumeration will encourage. So similar question to url: do you have examples from other API services, what would you recommend?

Perhaps we should discuss this in a call.

The idea of using the Dataset type is that users can find units of data that the node (your system) wants to highlight.

in this model, I would create an individual dataset record for every chunk of Argo data you'd like others to see as an output of your service. An API call to retrieve that is a valid value for contentUrl .

if you'd prefer not to share dataset-level records, that's fine too: you can just share one Service or WebAPI record for ArgoVis. this will reduce discoverability, but - as you say - may be more appropriate to guide users to the experience you want them to have.

  • spatialCoverage: is there a jsonld for "everywhere between these two latitudes"?

I'd use the box - @jmckenna you're the FOSS4G dude, better advice? WKT or GeoJSON preferred .

.

@jmckenna
Copy link
Collaborator Author

jmckenna commented Apr 29, 2024

@bkatiemills @pbuttigieg regarding the spatialCoverage, there are some very important points to be aware of:

  • schema.org expects lat,long (Y,X) coordinate pairs, which unfortunately is opposite of what the FOSS4G/spatial world uses
  • in fact, even this datasetTemplate-ARGO.json had an incorrect coordinate order (I fixed it now)
  • to provide specific bounds, the GeoShape box parameter is recommended (using lower left and upper right Y,X coordinates (in the form of "box": "miny minx maxy maxx") as follows:
"spatialCoverage": {
        "@type": "Place",
        "geo": {
            "@type": "GeoShape",
            "box": "-90 -180 90 -180"
        },
        "additionalProperty": {
            "@type": "PropertyValue",
            "propertyID": "http://dbpedia.org/resource/Spatial_reference_system",
            "value": "http://www.w3.org/2003/01/geo/wgs84_pos"
        }
},

I'm not sure if I answered your question, but keep that in mind anyway.

@pbuttigieg
Copy link
Collaborator

thanks @jmckenna - wasn't there an issue with the WGS84 link pointing explicitly to lat lon ? perhaps we should remove that suffix

@bkatiemills
Copy link

Thanks for your feedback here, folks - please see https://gist.github.com/bkatiemills/75efe5e9d6e67d8aa7f5add617e6591c for a schematic of where we're at. Can you read this over and make sure we're not going wildly off the rails here? Also we need input on the distribution block, it's not obvious to me what type and encodingFormat should be now that we're going for a link to an API helper rather than a static block of data.

Once this schematic is looking correct, I can write some scripts to fill in the things that need nightly updating and provide you with a URL to fetch.

@jmckenna
Copy link
Collaborator Author

@bkatiemills today both teams reviewed your Gist together and made some changes, below:

{
    "@context": {
        "@vocab": "https://schema.org/"
    },
    "@type": "Dataset",
    "@id": "https://registry.org/permanentUrlToThisJsonDoc",
    "name": "Argovis' representation of the Argo dataset",
    "description": "Argovis provides a representation of the profiles collected over the lifetime of the Argo program. This representation is intended to present an interpretation of Argo data that is lightly simplified from the original product, but still appropriate for a large majority of scientific and educational use cases. Simplifications include presenting delayed (better corrected and QCed) mode data where available; presenting interpolated biogeochemical data only; and merging core and bioogeochemical data collected in parallel into unified oceanic profiles.",
    "url": "https://github.com/argovis/demo_notebooks/blob/main/Intro_to_Argovis.ipynb",
    "license": "MIT", // should be more complete, the full name of the license or the link to it
    "citation": [
        "Tucker, T., D. Giglio, M. Scanderbeg, and S.S.P. Shen: Argovis: A Web Application for Fast Delivery, Visualization, and Analysis of Argo Data. J. Atmos. Oceanic Technol., 37, 401–416, https://doi.org/10.1175/JTECH-D-19-0041.1",
        "Wong, A. P. S., et al. (2020), Argo Data 1999–2019: Two Million Temperature-Salinity Profiles and Subsurface Velocity Observations From a Global Array of Profiling Floats, Frontiers in Marine Science, 7(700), doi: https://doi.org/10.3389/fmars.2020.00700",
        "Argo (2000). Argo float data and metadata from Global Data Assembly Centre (Argo GDAC). SEANOE. https://doi.org/10.17882/42182"
    ],
    "creator": "", // can be an array, with Person or Organisation types
    "version": "<timestamp to be updated on db write>",
    "keywords": [
        "Argo", 
        "ocean profiles", 
        "temperature", 
        "salinity", 
        "pressure", 
        "ocean biogeochemistry"
    ],
    "measurementTechnique": "http://www.argodatamgt.org/Documentation",
    "variableMeasured": [
        {
            "@type": "PropertyValue",
            "name": "<name from data_info[0]>",
            "url": "Perhaps a link to the ADMT docs that explain their variables?",
            "description": "<long name from data_info[2]>",
            "unitCode": "<units from data_info[2]>"
        },
        // ... to be enumerated for all variables
    ],
    "includedInDataCatalog": {
        "@type": "DataCatalog",
        "url": "https://argovis.colorado.edu/citations"
    },
    "temporalCoverage": "<min year>/<max year>"// we can consider using "to present" (is it "now"? to check how to do this) or (more accurate?) just update this to exact ISO timestamp every day
    "distribution": {
        "@type": "DataDownload", 
        "url": "https://argovis.colorado.edu/argourlhelper",
        "description": "Argovis provides no direct download of the dataset described in this record as it is too large to download in one click; however, please visit https://argovis.colorado.edu/argourlhelper to dynamically access your own subset of data"
    },
    "spatialCoverage": {
            "@type": "Place",
            "geo": {
                "@type": "GeoShape",
                "box": "-90 -180 90 180"// miny minx maxy maxx
            },
            "additionalProperty": {
                "@type": "PropertyValue",
                "propertyID": "http://dbpedia.org/resource/Spatial_reference_system",
                "value": "http://www.w3.org/2003/01/geo/wgs84_pos"
            }
    },
    "provider": [
        {
            "@type": "Organization",
            "legalName": "University of Colorado Boulder",
            "name": "Department of Atmospheric and Ocean Science",
            "url": "https://www.colorado.edu/atoc/"
        }
    ]
}

@bkatiemills
Copy link

@jmckenna thanks for your feedback! The gist is updated to reflect it - I have no outstanding questions here, the only remaining blanks are things to be filled in by the nightly update scripts (variables present, temporosparial extents). I'll try and find some time to implement this soon, and provide you with a URL you can scrape and tell me if the finished product is as expected.

@bkatiemills
Copy link

Ok team, here's a first production attempt at a blob of jsonld for the argo collection, lmkwyt: https://argovis-api.colorado.edu/summary?id=argo_jsonld&key=jsonld

@jmckenna
Copy link
Collaborator Author

jmckenna commented Jun 6, 2024

@bkatiemills thanks, looks good. I think for the ODIS front-end search, 2 very useful parameters missing are sdPublisher (party responsible for generating the metadata, in other words similar to your existing provider section), and creditText (how to cite the dataset), see the template. @pbuttigieg thoughts?

@bkatiemills
Copy link

Thanks @jmckenna - do you think we can just change provider to sdPublisher? I'm not sure there's a difference or a reason on our end to have both.

How does creditText differ from citation? We could do both, but I think we'd just use the first entry from creditText as citation.

@jmckenna
Copy link
Collaborator Author

jmckenna commented Jun 8, 2024

I recommend changing the provider text to sdPublisher (makes it easier on our front-end search code)

creditText would be the "Recommended Citation" for your dataset, whereas citation is used when you are referring to using someone else's creative work or dataset.

(in the ODIS front-end search results creditText is displayed literally as "Recommended Citation")

Yes I would just change to using only the creditText, with the first value, as you said.

@jmckenna
Copy link
Collaborator Author

(link to type WebAPI template discussed in today's meeting)

@bkatiemills
Copy link

@jmckenna sounds good, those suggestions will be reflected in tonight's update. We've also made the sitemap and cat entry as discussed; please let us know any further steps needed.

@bkatiemills
Copy link

Hi team - please let us know when our Argo record appears in your datasets list so we can confirm we hit all the requirements correctly; if there's something missing, also please let me know.

@bkatiemills
Copy link

Hi folks - we still don't see argovis appearing at https://oceaninfohub.org/results/Dataset?search_text=argovis&page=0 - is something wrong on our end we can address? Am I looking in the wrong place?

@jmckenna jmckenna marked this pull request as ready for review November 19, 2024 14:00
@jmckenna jmckenna merged commit 8d3dfbe into master Nov 19, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants