Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to advertise presence of different cadences for a dataset #226

Open
jvandegriff opened this issue Sep 23, 2024 · 5 comments
Open

how to advertise presence of different cadences for a dataset #226

jvandegriff opened this issue Sep 23, 2024 · 5 comments

Comments

@jvandegriff
Copy link
Collaborator

jvandegriff commented Sep 23, 2024

The problem: how to let HAPI clients know that there are different cadences of a dataset available in an automated way so that clients can automatically select an appropriate cadence.

This issue relates to several other issues about linkages and file listings, etc. This issue is focused on solving the cadence problem in a way that doesn't paint us into a corner with linking file listings, availability info, and possibly even images or semantic descriptions of data.

Other related issues:

Two existing servers have solved the cadence linking problem in closely related ways, so we should come up with a recommended way.

The KNMI solution:

"x_relations": [
        {
            "id": "brik_ii_electron_density_PT0_005S",
            "description": "Electron density data from the BRIK-II scintillation monitor, downsampled to 5 millisecond cadence",
            "cadence": "PT0.005S",
            "type": "resample",
            "method": "max",
            "add": "automatic"
        },
        {
            "id": "brik_ii_electron_density_PT0_01S",
            "description": "Electron density data from the BRIK-II scintillation monitor, downsampled to 10 millisecond cadence",
            "cadence": "PT0.01S",
            "type": "resample",
            "method": "max",
            "add": "automatic"
        },
        {
            "id": "brik_ii_electron_density_PT0_05S",
            "description": "Electron density data from the BRIK-II scintillation monitor, downsampled to 50 millisecond cadence",
            "cadence": "PT0.05S",
            "type": "resample",
            "method": "max",
            "add": "automatic"
        }

The Intermagnet Solution

  • The Intermagnet hapi server
  • https://imag-data.bgs.ac.uk/GIN_V1/hapi/catalog
  • does it differently. There are 9 or more different versions of each dataset, some for different processing options, some for different cadences. Here is the start of the catalog,
  "catalog" : [ {
    "id" : "aae/definitive/PT1M/native",
    "title" : "Definitive minute data in NATIVE orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/definitive/PT1M/xyzf",
    "title" : "Definitive minute data in XYZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/definitive/PT1M/hdzf",
    "title" : "Definitive minute data in HDZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/definitive/PT1M/diff",
    "title" : "Definitive minute data in DIFF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1M/native",
    "title" : "Quasi-def minute data in NATIVE orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1M/xyzf",
    "title" : "Quasi-def minute data in XYZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1M/hdzf",
    "title" : "Quasi-def minute data in HDZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1M/diff",
    "title" : "Quasi-def minute data in DIFF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/reported/PT1M/native",
    "title" : "Reported minute data in NATIVE orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1M/native",
    "title" : "Best-avail minute data in NATIVE orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1M/xyzf",
    "title" : "Best-avail minute data in XYZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1M/hdzf",
    "title" : "Best-avail minute data in HDZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1M/diff",
    "title" : "Best-avail minute data in DIFF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1S/native",
    "title" : "Quasi-def second data in NATIVE orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1S/xyzf",
    "title" : "Quasi-def second data in XYZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1S/hdzf",
    "title" : "Quasi-def second data in HDZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/quasi-def/PT1S/diff",
    "title" : "Quasi-def second data in DIFF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/reported/PT1S/native",
    "title" : "Reported second data in NATIVE orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1S/native",
    "title" : "Best-avail second data in NATIVE orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1S/xyzf",
    "title" : "Best-avail second data in XYZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1S/hdzf",
    "title" : "Best-avail second data in HDZF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
    "id" : "aae/best-avail/PT1S/diff",
    "title" : "Best-avail second data in DIFF orientation from Addis Ababa, Ethiopia (AAE)"
  }, {
  • In this Intermagnet catalog, the dataset names have a slash-separated portion of the cadence.
  • There is no explicit linkage of the different cadences.
@jvandegriff
Copy link
Collaborator Author

I propose a new endpoint for linkages.

This endpoint would house content like what the KNMI server has in the x_relations block.

It keeps the info response for a dataset to be only about that dataset. Adding linkages info in a dataset info block adds a bad dependency - now that dataset info depends on other things besides the content of that data, namely the way the dataset relates to other datasets, and that info will change and break the info for just the basic data, and that kind of dependency should be avoided.

This solution allows for expanding to other linkages (file listings, availability info, etc)

The statement that datasets are linked by cadence should imply that the important parameters in the data are present in each of the files and have the same names. If this is not true, clients will have a hard time connecting different cadence parameters across datasets.

KNMI has this:

"x_relations": [
        {
            "id": "brik_ii_electron_density_PT0_005S",
            "description": "Electron density data from the BRIK-II scintillation monitor, downsampled to 5 millisecond cadence",
            "cadence": "PT0.005S",
            "type": "resample",
            "method": "max",
            "add": "automatic"
        },

and the relations endpoint would need to specify what the linkage is (i.e., a cadence linkage):

"cadences": [
   {
      "source_dataset": "brik_ii_electron_density",  # this is the highest cadence dataset
      "other_cadences": [
               { "id": "brik_ii_electron_density_PT0_005S",
                    # don't need to include the cadence - that is available in the `info` of the listed datasets
                  "cadence": "PT5S", 
                   #  DO WE INCLUDE TYPE  "type": "resample", "method": "max" 
                    #  no: any resampling info clearly belongs in the `info` block of the resampled dataset, just like cadence
               },
               { "id": "brik_ii_electron_density_PT0_001M" },
               { "id": "brik_ii_electron_density_PT0_005M" }
          ]
   },

Other kinds of linkages could be specified too:
"file_listings": [
   {
      "source_dataset": "brik_ii_electron_density",
       "listing_dataset": "brik_ii_electron_density_LISTING",   # this would need to follow a specific schema!!!
    }
]

@jvandegriff
Copy link
Collaborator Author

If we use RDF triples in JSON-LD, the relations would look like this:

  • "brik_ii_electron_density_PT1M" isAltCadenceOf "brik_ii_electron_density"
  • "brik_ii_electron_density_PT1H" isAltCadenceOf "brik_ii_electron_density"

Or for file listings:

  • "brik_ii_electron_density_listing" isFileListingOf "brik_ii_electron_density"

Maybe JSON-LD has it the other way:

  • "brik_ii_electron_density" is higherResolutionOf "brik_ii_electron_density_PT1M"
  • "brik_ii_electron_density" is higherResolutionOf "brik_ii_electron_density_PT1H"

@jvandegriff
Copy link
Collaborator Author

jvandegriff commented Sep 24, 2024

Here's a simple version of the non-JSON-LD way of capturing this info:

"cadences": [
   {
      "smallest": "brik_ii_electron_density",
      "other": [ "brik_ii_electron_density_PT0_005S", "brik_ii_electron_density_PT0_001M",
                        "brik_ii_electron_density_PT0_005M" ]
   },
"file_listings": [
   { "sourceDataset": "brik_ii_electron_density",
      "listingDataset": "brik_ii_electron_density_LISTING" },
   { "sourceDataset": "magnetic_field",
      "listingDataset": "magnetic_field_LISTING" }
    }
]

or even simpler as this (which assumes the first one has a kind of primary role as the smallest cadence):

"cadences": [
   [ "brik_ii_electron_density", "brik_ii_electron_density_PT0_005S", "brik_ii_electron_density_PT0_001M",
                        "brik_ii_electron_density_PT0_005M" ], 
   [ "brik_ii_proton_density", "brik_ii_proton_density_PT0_005S", "brik_ii_proton_density_PT0_001M",
                        "brik_ii_proton_density_PT0_005M" ]
   },

But this does not allow for expansion of objects with server-specific details, so it is ruled out.

The remaining options are between these two formats below:

"cadences": [
  {
      "smallest": { "dataset": "brik_ii_electron_density" },
      "other": [
          { "dataset": "brik_ii_electron_density_PT0_005S", "x_type": "resample", "x_method": "max"},
          { "dataset": "brik_ii_electron_density_PT0_001M"} ,
          { "dataset": "brik_ii_electron_density_PT0_005M" }
      ]
  }
]

Inside each object, you could add x_ items like we see in the KNMI server example and is shown in one row above.

Or, just make an ordered list with the smallest cadence first:

"cadences": [
       [   { "dataset": "brik_ii_electron_density" },
           { "dataset": "brik_ii_electron_density_PT0_005S", "x_type": "resample","x_method": "max"},
           { "dataset": "brik_ii_electron_density_PT0_001M"} ,
           { "dataset": "brik_ii_electron_density_PT0_005M" }
       ],

       [   { "dataset": "brik_ii_proton_density"},
           { "dataset": "brik_ii_proton_density_PT0_005S", "x_type": "resample", "x_method": "max"},
           { "dataset": "brik_ii_proton_density_PT0_001M"} ,
           { "dataset": "brik_ii_proton_density_PT0_005M" }
       ]
 ]

It's hard to tell which one is better -- (the more expanded one with smallest and other versus just the list). Just having the list is simpler, but less self explanatory.

We talked about this with further on Wed Sep 15 and the "x_" fields could just be in the info response since that is where all the details should be for that dataset anyway. These would be "x_" tags related to cadence, like "x_cadence_type" and "x_cadence_method"

@jvandegriff
Copy link
Collaborator Author

More discussion on this:

Looks like we need to keep the longer version (not just list of dataset names) to enable the creation of RDF triples with the right directional relationship.

And we dont' want to allow or encourage extra metadata items describing the alternate datasets since all of those details should instead be found with the info/ of the dataset itself, and the linkages metadata should ONLY be about the fact that there is link (i.e., don't mix in extra dataset metadata with the linkages).

The choice of smallest and other should reflect language from established linkage mechanisms such as schema.org and science-on-schema.org, so maybe source and product.

See for example:

This then is the current thinking:

"cadences": [
  {
      "source": { "dataset": "brik_ii_electron_density" },
      "product": [
          { "dataset": "brik_ii_electron_density_PT0_005S"},
          { "dataset": "brik_ii_electron_density_PT0_001M"} ,
          { "dataset": "brik_ii_electron_density_PT0_005M" }
      ]
  }
]

@jvandegriff
Copy link
Collaborator Author

We intend to make it easy for these kinds of linkages to map into JSON-LD, so we really need to learn more about that!!

https://json-ld.org/primer/latest/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant