-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metadata about the metadata record-- subjectOf, about.... #13
Comments
There are two patterns that could be used to structure the two parts of the metadata record: Option 1. The root object is the described resource:
Option 2: root object is the metadata record
The rdf triples generated by these two approaches are identical |
the metadata about the metadata is important in harvesting/federated catalog systems to keep track of where metadata came from, what format/profile it uses (harvesters need this to process), and update dates. Many people are using the approach with the root of the schema.org record with "@id": "ex:URIforDescribedResource" (first approach above); this is the information that goes into search indexes in general. from that point of view the first approach makes more sense as it follows common practice. The pit fall is that the 'subjectOf' property is widely used for all kinds of things, so care is necessary to find the 'subjectOf' that provides the 'self' information about the metadata record. I suggest modifying the above encoding to include a description string that clearly identifies the subjectOf link to the metadata digital document.
including the 'about' property with the back link to ex:URIforDescribedResource is useful but could be calculated with inverse of the subjectOf property. |
xref WorldFAIR D11.3, Section 1.2 In the JSON-LD specification , the That URL resolves to a JSON-LD document that (as of the timestamp on this comment), is: {
"@context": {
"@vocab": "http://schema.org/",
"image": {
"@type": "@id"
}
},
"@id": "http://me.markus-lanthaler.com/",
"@type": "Person",
"name": "Markus Lanthaler",
"honorificPrefix": "Dr.",
"image": "http://www.markus-lanthaler.com/images/markus-lanthaler.jpg",
"url": "http://www.markus-lanthaler.com/",
"nationality": {
"@type": "Country",
"name": "Italy"
},
"jobTitle": "Software Engineer",
"affiliation": {
"@type": "Organization",
"name": "Google",
"url": "http://www.google.com/"
},
"worksFor": {
"@type": "Organization",
"name": "Google",
"url": "http://www.google.com/"
},
"sameAs": [
"https://twitter.com/MarkusLanthaler",
"https://www.google.com/+MarkusLanthaler",
"https://www.linkedin.com/in/markuslanthaler"
]
} This example uses schema.org semantics, and is typed as To identify the resource described by the metadata, the In the schema.org world, metadata about the metadata record itself can be provided by properties like This is resolvable by having another file (F2) containing metadata about the JSON-LD record above (F1). If one wants to serialise F2 in JSON-LD with schema.org semantics, one has to be a little careful.
F2 would look something like (with as many properities from {
"@context": {
"@vocab": "http://schema.org/"
},
"@id": "http://metadata.about.me.markus-lanthaler.com/",
"@type": "Dataset",
"name": "Metadata about the Person, Markus Lanthaler, in JSON-LD",
"identifier": "http://me.markus-lanthaler.com/",
"encodingFormat": "application/json+ld",
"dateCreated": "2024-02-13",
"dateModified": "2024-05-23",
"datePublished": "2024-05-23",,
"creator": {
"@type": "Organization",
"name": "Some Org",
"url": "http://www.someorg.com/"
},
"maintainer": {
"@type": "Organization",
"name": "Some Org",
"url": "http://www.someorg.com/"
}
} Naturally, this could end up being metadata inception, so one has to draw the line on explicit representation of metadata about (meta)data somewhere and then rely on, e.g., a file system's native metadata functions. |
@pbuttigieg thanks for the response. I think the solution I suggested is pretty much equivalent to your F2, Except I am proposing to include it in line. I don't see a big problem for processors who don't care about it to ignore it. There are couple issues in F2-- for instance it generates this triple: what is missing is the 'about' link from F2 to the node it describes which is the I was proposing a more specific statement of the encoding format, pointing to a specific profile. In the wild, it would probably be useful to make this an array including generic and specific formats along the lines of [json, json-ld, specificprofile] I also propose that the metadata record is better represented as a "DigitalDocument" than "Dataset", since its a single digital object. perhaps using aside: |
I'm not sure it is - the way the aboutness is handled is quite different. The R sub-principles in the FAIR principles state that the metadata should persist after any data they describe are deleted. Thus ODIS will recommend keeping (meta)metadata separate.
I do. It seems like an entirely avoidable issue, and at scale it's many operations, thus avoidable energy use. At any rate, CDIF guidance shouldn't prescribe one or the other.
I think that's consistent- the identifier value space is different from the
As mentioned and explained above , I think this isn't correct. F2 isn't about the described resource.
As above, I don't think that's right. F2 is about F1, not the thing F1 describes.
I'm not sure what is meant here.
Both work, Dataset is more useful and accurate IMO
risky - names are fickle
I'll think a bit more, but my first approximation is that it's a lot about the predicate. |
Next iteration. @id can identify a JSON-LD object or the thing that that object is about. It's ambiguous as defined in the various specs. Proposed CDIF solution: Convention - include schema:identifier property that identifies a thing in the world that is the subject of a JSON-LD graph node. First guess default is then that @id identifies the 'representation' -- the JSON object that contains the @id element. Longer explanation from draft CDIF handbook: In a harvesting/federated catalog system some metadata about the metadata is important to keep track of where metadata came from, what format/profile it uses (harvesters need this to process), and update dates see Metadata Content Requirements. Unambiguous expression of this information requires making statements about a metadata record distinct from the thing in the world that the metadata describes (See Github issues 1,2 ). In an RDF framework, this requires a distinct identifier for the metadata record object that will serve as the subject for these triples. Schema.org includes several properties that can be used to embed information about the metadata record in the resource metadata: sdDatePublished, sdLicense, sdPublisher, but lacks a way to provide an identifier for the metadata record distinct from the resource it describes, to specify other agents responsible for the metadata except the publisher, or to assert specification or profile conformance for the metadata record itself. In the RDF serialization, Schema.org metadata records are JSON-LD node objects, and include an "@id" keyword with a value that identifies the node. This identifier can be interpreted to represent a thing in the world that the metadata record (the 'node') is about, or to represent the metadata record (a JSON object) itself. Here is a short example record (other '@' properties are explained below):
When this JSON-LD is converted to RDF triples (e.g. using the JSON-LD playground ), this results:
The interpretation of the first two sets of triples would be that they are statements about the thing in the world that the metadata record is about. The third triple is ambiguous-- was the metadata content modified, or the described resource in the world? There does not seem to be any recognized best practice or consensus for dealing with this issue, so CDIF defines these conventions. Use the schema.org identifier property to identify a thing in the world that is the subject of the JSON-LD node. The identified thing might be physical, imaginary, abstract, or a digital object. The JSON-LD @id property identifies a node in a graph, and can be interpreted in different ways; as a URI it is expected to dereference to produce the same JSON-LD object in which it is defined. Given this convention, when the metadata record is processed, the processor should use the schema:identifier as subject of triples about the subject of the metadata record to avoid ambiguity. In addition, this convention would suggest that if a schema:identifier property is present, the @id property should be interpreted to identify the JSON object that is the representation of the node in the knowledge graph. Statements about the metadata record as a distinct entity should be made using a separate identified node object. This node object can be embedded in the metadata record about the resource in the world (Example 1 below), or published as a separate node (Example 2 below).
Example 1. Metadata about the metadata embedded.
Example 2. Metadata about metadata as a separate graph node. Including the schema:description with the string "metadata about documentation for ex:URIforDescribedResource" will allow disambiguating different usages of the subjectOf property. The ex namespace in the example above is only included so the example is valid; actual metadata would likely have its own namespace for resource and metadata URIs. The distinct identifier for the metadata record (ex:URIforNode1) allows statements to be made about the metadata separately from statements about the resource it describes. |
another possible solution:
The metadata record can use @id with identifier for the described resource, so the generated triples with @id make sense. The node with information about the metadata record links to its target metadata using the sdo:url property, under the interpretation that dereferencing a node identifier should return the JSON object that has that @id. Seems less divergent with common usages that the sdo:identifier approach suggested above. better yet instead of "url": ...
seems clearer to me. including an "additionalType":"ex:metadataDocumentation" or something like that in the metadata about metadata node would also help clarify things. |
suggested solution
the "@id": "ex:URIforNode2" is analogous to dcat:CatalogRecord. This approach avoids allows constructing rdf triples from the JSON-LD according to the standard practice (using the @id as the subject) instead of having to use the sdo:identifier as the subject. |
Decision from discussions at Dagstuhl 2024-10: Decision is to recommend serialization like this:
or the equivalent serialization as a graph with two distinct nodes:
|
CDIF discovery document asserts A metadata record has two parts; one part is about the metadata record itself, the other part is the content about the resource that the metadata documents. The part about the record specifies the identifier for the metadata record, agents with responsibility for the record, when it was last updated, what specification or profiles the metadata serialization conforms to, and other optional properties of the metadata that are deemed useful.
Schema.org includes several properties that can be used to embed information about the metadata record in the resource metadata: sdDatePublished, sdLicense, sdPublisher, but lacks a way to provide an identifier for the metadata record distinct from the resource it describes, to specify other agents responsible for the metadata except the publisher, or to assert specification or profile conformance for the metadata record itself.
The text was updated successfully, but these errors were encountered: