Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstructured JSON metadata #279

Open
eirini-zormpa opened this issue Nov 15, 2024 · 4 comments
Open

Unstructured JSON metadata #279

eirini-zormpa opened this issue Nov 15, 2024 · 4 comments

Comments

@eirini-zormpa
Copy link

About

Metadata created for our courses (from the YAML headers and elsewhere) are compiled into a JSON file. The problem is primarily that the metadata included there is unstructured and difficult to make sense of. There is also unnecessary information there, but that is less problematic. It is also likely that the metadata that is included doesn't follow the convention set out by Bioschemas.org.

It is important that our training materials are interoperable with other training materials (e.g. from the Carpentries) so this should be resolved to harmonise our metadata with commonly used metadata standards.

Further information

This is visible through the inspector tab on the browser's developer tools. The relevant section is: <script id="__NEXT_DATA__" type="application.json">
Screenshot 2024-11-15 at 12 16 46

The equivalent for a Carpentries lesson looks like this:
Screenshot 2024-11-15 at 11 47 48

@eirini-zormpa
Copy link
Author

I believe this is the part of the workbench infrastructure that creates the json-ld file for the carpentries lessons: https://github.com/carpentries/sandpaper/blob/main/R/utils-metadata.R

@martinjrobins
Copy link
Contributor

So the issue here is the the json you are seeing from our website was never meant for public consumption, its just a javascript variable that is used by the page source to render the page.

I'm curious what carpentries uses the json-ld metadata for? Is it just for web crawlers? Google in particular has web crawlers that feed information into its search and knowledge base, is the aim to feed these, or is there a wider goal?

In any case, feeding our tech company overlords is probably a good thing for search engine optimisation, so its probably something we should have. Nextjs has a way of publishing metadata for your site, see https://nextjs.org/docs/app/building-your-application/optimizing/metadata. I think its just a matter of agreeing on a format, I'm happy to just re-use the carpentries one?

@martinjrobins
Copy link
Contributor

this is the schema that carpentries uses:
https://bioschemas.org/profiles/TrainingMaterial/1.0-RELEASE

I notice they also have schema for courses:
https://bioschemas.org/profiles/Course/1.0-RELEASE

and course instances:
https://bioschemas.org/profiles/CourseInstance/1.0-RELEASE

which could also be useful for us

@eirini-zormpa
Copy link
Author

eirini-zormpa commented Nov 15, 2024

not entirely sure what it's used for to be honest 😅 as you say though, it's probably a good idea to do it anyway and link up with Toby once we have 😊

something else to be mindful of when we change the published metadata is rename some of our metadata fields to be consistent with bioschemas. Related to that, I'm not completely sure if we should use the Training Material or Course metadata.

Here is a list of what I found to be different and how I think it should be:

  • instead of summary use description
  • instead of tags use keywords
  • instead of learningOutcomes use teaches

A few of these I'm not so sure about.

  • For example, we use attribution to cite the places we reused materials from AND to tell other people how to cite us As best as I can tell, for materials that was created by UNIVERSE-HPC, we should use the author field. For stuff we're using from others, the best option likely is citation. Note though that this only appears as an option for the Course metadata and not the Training material metadata 😞
  • Arguably, coursePrerequisites is like our dependsOn. though I think the dependsOn field is used for the graph and we may want to list things as prerequisites even if we don't have materials that teach them.

Other stuff we might want to include:

  • timeRequired
  • provider (for courses)
  • hasCourseInstance (for courses)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants