-
Notifications
You must be signed in to change notification settings - Fork 0
CAML
CAML is a recursive acronym for "CAML is Another Mapping Language". It is a native mapping language used by the schemaorg-pipeline library to write a data map between schema.org terms and the source data.
A data map in CAML is composed of one or more mapping definitions. A mapping definition is written as a key-value pair separated by a colon and at least one space, where the key is the schema.org keyword and the value is either a data path, a data object or a constant value.
A data path represents the physical data location situated at the source. The notation always starts with a slash /
followed by a node name. The slash character is also used as a delimiter to separate multiple node names used in the path. For example:
name: /Dataset/Title
description: /Dataset/Description
keyword: /Dataset/Keywords/Keyword
(Note again the left-hand side is for the vocabulary in schema.org and the right-hand side is the data path)
A data object is a group of mapping definitions at the same indentation level. For schema.org-compatibility, every data object must have a type definition, indicated by the keyword @type and followed by the corresponding schema.org type name.
distribution: /Dataset/Distributions/Distribution
@type: 'DataDownload'
contentUrl: /AccessUrl
fileFormat: /Format
publisher: /Source
(Line 2-5 is the data object with a type of DataDownload)
When a data object is nested within another data object, it must first define the root data path before attaching the data object. Consequently, all data paths inside the nested data object will have the same root path. For example, using the same example above, the DataDownload object has a root path /Dataset/Distributions/Distribution
and all the succeeding data paths use the same root path for getting the content URL, file format, and publisher.
A constant value is any other text that is enclosed by single quotation marks. A backslash should be used as an escape character in the text. For example:
@type: 'Dataset'
inLanguage: 'EN'
An array can be constructed by creating a multiple mapping definitions but with the same key label. For example:
identifier: /Dataset/Identifier
identifier: /Dataset/SecondaryIdentifier
identifier: /Dataset/Others/PMID
A pair is a double-constant value enclosed by round brackets. It is useful to assign two strings in a single mapping definition. For example:
@prefix: ('schema', 'http://schema.org/')
@prefix: ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#')
@prefix: ('rdfs', 'http://www.w3.org/2000/01/rdf-schema#')
(For now, the expression is used exclusively by the @prefix
keyword)
Concat is a function to concatenate one value from a data source with some strings. Example usages:
=concat(/dataset/identifier, '-ID')
=concat('ID-', /dataset/identifier)
=concat('http://identifier.org/mesh/', /dataset/identifier, '-ID')
Assuming the /dataset/identifier
contains a value "12345", then the function will give the outputs, as follows:
"12345-ID"
"ID-12345"
"http://identifier.org/mesh/12345-ID"
- @id: (optional) to indicate the instance unique identifier. If present, the value is used for filtering in the data extraction step.
- @type: (mandatory) to indicate the instance's schema.org type.
- @prefix: (optional) to specify the prefix definition used by the source data.
The example below shows an example of a data map used to generate schema.org markup data from XML documents in ClinicalTrials.gov website.
@type: 'MedicalTrial'
name: /clinical_study/official_title
alternateName: /clinical_study/brief_title
alternateName: /clinical_study/acronym
identifier: /clinical_study/id_info/org_study_id
identifier: /clinical_study/id_info/nct_id
identifier: /clinical_study/id_info/secondary_id
status: /clinical_study/overall_status
description: /clinical_study/detailed_description/textblock
studySubject: /clinical_study/condition
phase: /clinical_study/phase
code: /clinical_study/condition_browse
@type: 'MedicalCode'
codeValue: /mesh_term
codingSystem: 'MeSH'
sponsor: /clinical_study/sponsors/lead_sponsor
@type: 'Organization'
name: /agency
additionalType: 'Lead Sponsor'
sponsor: /clinical_study/sponsors/collaborator
@type: 'Organization'
name: /agency
additionalType: 'Collaborator'
studyLocation: /clinical_study/location/facility
@type: 'AdministrativeArea'
name: /name
additionalType: 'Facility'
address: /address
@type: 'PostalAddress'
addressLocality: /city
addressRegion: /state
postalCode: /zip
addressCountry: /country
Please visit the playground (Try Example > Example CAML: Annotate ClinicalTrials.gov XML document) to see the full-length map and a live demo of evaluating this mapping.