Unstructured JSON metadata #279

eirini-zormpa · 2024-11-15T12:43:15Z

About

Metadata created for our courses (from the YAML headers and elsewhere) are compiled into a JSON file. The problem is primarily that the metadata included there is unstructured and difficult to make sense of. There is also unnecessary information there, but that is less problematic. It is also likely that the metadata that is included doesn't follow the convention set out by Bioschemas.org.

It is important that our training materials are interoperable with other training materials (e.g. from the Carpentries) so this should be resolved to harmonise our metadata with commonly used metadata standards.

Further information

This is visible through the inspector tab on the browser's developer tools. The relevant section is: <script id="__NEXT_DATA__" type="application.json">

The equivalent for a Carpentries lesson looks like this:

eirini-zormpa · 2024-11-15T12:48:30Z

I believe this is the part of the workbench infrastructure that creates the json-ld file for the carpentries lessons: https://github.com/carpentries/sandpaper/blob/main/R/utils-metadata.R

martinjrobins · 2024-11-15T13:35:57Z

So the issue here is the the json you are seeing from our website was never meant for public consumption, its just a javascript variable that is used by the page source to render the page.

I'm curious what carpentries uses the json-ld metadata for? Is it just for web crawlers? Google in particular has web crawlers that feed information into its search and knowledge base, is the aim to feed these, or is there a wider goal?

In any case, feeding our tech company overlords is probably a good thing for search engine optimisation, so its probably something we should have. Nextjs has a way of publishing metadata for your site, see https://nextjs.org/docs/app/building-your-application/optimizing/metadata. I think its just a matter of agreeing on a format, I'm happy to just re-use the carpentries one?

martinjrobins · 2024-11-15T13:40:11Z

this is the schema that carpentries uses:
https://bioschemas.org/profiles/TrainingMaterial/1.0-RELEASE

I notice they also have schema for courses:
https://bioschemas.org/profiles/Course/1.0-RELEASE

and course instances:
https://bioschemas.org/profiles/CourseInstance/1.0-RELEASE

which could also be useful for us

eirini-zormpa · 2024-11-15T14:11:04Z

not entirely sure what it's used for to be honest 😅 as you say though, it's probably a good idea to do it anyway and link up with Toby once we have 😊

something else to be mindful of when we change the published metadata is rename some of our metadata fields to be consistent with bioschemas. Related to that, I'm not completely sure if we should use the Training Material or Course metadata.

Here is a list of what I found to be different and how I think it should be:

instead of summary use description
instead of tags use keywords
instead of learningOutcomes use teaches

A few of these I'm not so sure about.

For example, we use attribution to cite the places we reused materials from AND to tell other people how to cite us As best as I can tell, for materials that was created by UNIVERSE-HPC, we should use the author field. For stuff we're using from others, the best option likely is citation. Note though that this only appears as an option for the Course metadata and not the Training material metadata 😞
Arguably, coursePrerequisites is like our dependsOn. though I think the dependsOn field is used for the graph and we may want to list things as prerequisites even if we don't have materials that teach them.

Other stuff we might want to include:

timeRequired
provider (for courses)
hasCourseInstance (for courses)

alasdairwilson · 2024-11-27T14:44:07Z

I am not sure about what this is about, I think what they are doing is having a set of metadata so they can use their course delivery website, search courses etc. this is serving a similar purpose as the front matters yaml schema that universe-hpc is using and that of course gutenberg relies on as well but gutenberg itself is not serving up any of this json to the user, as martin said it is behind-the-hood stuff. Stuff like hasCourseInstance is presumably a specific requirement of their deployment, others are just an different field serving a similar purpose.

Similar purpose here is doing a lot of work because simply renaming these fields, as well as requiring significant rewrites, does not guarantee that the fields are serving the same expected purpose that they are in this other project.

There is definitely scope for improving the extent of the metadata that comes a long with the course and as that metadata is extended then we can make more use of it in gutenberg, if you had timerequired then we could be displaying this (but in general our courses are not a "course" in the traditional sense, rather they are groupings of related material) so this again comes back to the similar purpose rather than same purpose argument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstructured JSON metadata #279

Unstructured JSON metadata #279

eirini-zormpa commented Nov 15, 2024

eirini-zormpa commented Nov 15, 2024

martinjrobins commented Nov 15, 2024

martinjrobins commented Nov 15, 2024

eirini-zormpa commented Nov 15, 2024 •

edited

Loading

alasdairwilson commented Nov 27, 2024

Unstructured JSON metadata #279

Unstructured JSON metadata #279

Comments

eirini-zormpa commented Nov 15, 2024

About

Further information

eirini-zormpa commented Nov 15, 2024

martinjrobins commented Nov 15, 2024

martinjrobins commented Nov 15, 2024

eirini-zormpa commented Nov 15, 2024 • edited Loading

alasdairwilson commented Nov 27, 2024

eirini-zormpa commented Nov 15, 2024 •

edited

Loading