In this section we list all legal mapping patterns that can be used to express how instances of VO-DML-defined types are represented in a VOTable and the possible roles they play. It defines the VOTable annotation syntax, what restrictions there are, and how to interpret the annotation semantically.
The organization of the following sections is based on the different VO-DML concepts that can be represented. Each of these subsections contains sub-subsections which represent the different possible ways the concept may be encountered in a VOTable and discuss rules and constraints on those annotations. We start with Model, and then we discuss value types (PrimitiveType, Enumeration and DataType) and Attributes. Then ObjectType and the relationships, Composition (collection), Reference and Inheritance (extends). Package is not mapped: none of the use cases required this element to be actually mapped to a VOTable instance.
(TODO See whether we should define a list of all the legal mapping patterns in an Appendix.)
Some comments on how we refer to VOTable and VO-DML elements (TODO TBD redundant with text above in section ...)
-
When referring to VOTable elements we will use the notation by which these elements will occur in VOTable documents, i.e. in general “all caps”, E.g. GROUP, FIELD, (though FIELDref).
-
When referring to rows in a TABLE element in a VOTable, we will use TR, when referring to individual cells, TD. Even though such elements only appear in the TABLEDATA serialization of a TABLE. When referring to a column in the TABLE we will use FIELD, also if we do not intend the actual FIELD element annotating the column.
-
When referring to an XML attribute on a VOTable element we will prefix it with a ‘@’, e.g.
@id
,@ref
. -
References to VO-DML elements will be capitalized and in bold face,
-
using their VO-DML/XSD type definitions. E.g. ObjectType, Attribute.
The following list defines some shorthand phrases (italicized), which we use in the descriptions below:
-
Generally when using the phrase meta-type we mean a "kind of" type as defined in VO-DML. These are PrimitiveType, Enumeration, DataType and ObjectType.
-
With atomic type we will mean a PrimitiveType or an Enumeration as defined in VO-DML.
-
A structured type will refer to an ObjectType or DataType as defined in VO-DML.
-
With a property available on or defined on a (structured) type we will mean an Attribute or Reference, or (in the case of ObjectTypes) a Collection defined on that type itself, or inherited from one of its base class ancestors.
-
A VO-DML Type plays a role in the definition of another (structured) type if the former is the declared data type of a property available on the latter.
-
When writing that a VOTable element represents a certain VO-DML type, we mean that the VOTable element is mapped either directly to the type, or that it identifies a role played by the type in another type’s definition.
-
A descendant of a VOTable element is an element contained in that element, or in a descendant of that element. This is a standard recursive definition and can go up the hierarchy as well: an ancestor of an element is the direct container of that element, or an ancestor of that container.
Regarding the normative aspects of this specification
When we say this section is NORMATIVE we mean that:
- when a client finds an annotation pattern conforming to one defined here, that client is justified in interpreting it as described in the comments for that pattern. It is an ANNOTATION ERROR if that were to lead to inconsistencies1.
- when a client encounters a pattern not in this list, the client SHOULD ignore it. Interpreting it as a mapping to a data model MAY work, but is not mandated and other clients need not conform to this.
(TODO Is this section supposed to come after DataType as it inherits all of the DataType's patterns, e.g. Attributes?)
A DataType instance (also a value) is structured; it consists of
values assigned to each of its attributes and possibly references. To
represent the complete instance of a DataType the various attributes
(and references) must be grouped together in an INSTANCE
;
If all values are stored exclusively inside this element or the elements
its descendants point to and are not stored in any table elements,
then the INSTANCE
is said to directly
represent a complete instance.
If even only one of the attributes is stored in a FIELD
and accessed
through a reference by any of the INSTANCE
's descendants,
then the INSTANCE
is said to indirectly represent
possibly multiple
instances, one for each TR
.
We will use these terms, direct and indirect representation all through the document.
The attribute values of a DataType are stored according to the prescription for storing instances of their data type. For attributes with declared data type a PrimitiveType or Enumeration section provides details. If the attribute’s data type is itself a DataType the prescription in the current section should be used recursively. DataTypes can also have References to ObjectTypes, but a discussion of how to store References is deferred to [@sec:references], after the discussion of storing ObjectType instances, which was provided in section [@sec:objecttype].
(TODO OL: Should this be supported at all? It could just be mentioned that a DataType instantiated as root is not supported by the standard, because they should always be enclosed in ObjectType instances, but clients should try and interpret them anyway).
Comments:
The possible role this instance plays in the VOTable document must have been defined outside of any mapping to a data model. After all, in contrast to instances of ObjectTypes, the existence of instances of value types need not be explicitly stated (see the description of value types in [1]). An example could be a VOTable containing the result of a simple cone search. A RESOURCE in that document might contain the position of the cone, which may be mapped through its to a DataType “src:source.SkyCoordinate”.
(TODO Again: is this required? It could just be a special case clients should be encouraged to support).
Note that the a DataType instance is allowed
to be contained in an INSTANCE
without a type. There are some restrictions. NONE of its ancestor
INSTANCE
s MAY be mapped to a data model element. This would conflict with
rules of mapping such an ancestor INSTANCE
that state that children of such
INSTANCE
s MUST be mapped to Roles of the structured type it represents.
An example of this pattern is the case of an assumed DAL protocol, say SCS that defines implicitly some data structure that has components that may be represented by elements from a data model as I the following example.
Restrictions:
-
The Attribute identified by the nested
INSTANCE
MUST be available to the structured type represented by the containingINSTANCE
. -
The attribute
INSTANCE
SHOULD (TODO OL: let's make sure information is equally available to naive and guru clients... if it is, we might even remove this altogether rather than having a SHOULD) also declare the actual structured type it represents, through itsTYPE
. This type MUST be a valid type for the Attribute, i.e. the Attribute's declared type or one of its subtypes. If the type in the serialization is a subtype, theTYPE
MUST declare that.
PrimitiveTypes and Enumerations represent atomic values. They
appear in serializations of data models predominantly in their role as
attributes defined on structured types (ObjectType or DataType)
and this specification focuses on that role. VO-DML assumes
the existence of a standard data model, namely the ivoa
model
defined in [@vodml], defining primitive types such as ivoa:integer,
ivoa:string etc. But there is no new semantics implied by these types
beyond the ones their counterparts in the VOTable schema carry2.
(TODO much of the above is probably obsoleted by the new mapping strategy)
This indicates a PARAM that is not contained in a VODML-annotated
INSTANCE
, has a VODML/TYPE
identifying an atomic type and no VODML/ROLE
.
(TODO TBD do we want to support this? For if so, what about FIELDs? OL: NO!)
TODO
- VODML/OPTION
- value to literal
- value to SKOS concept
TODO
Restrictions
-
The Attribute identified by the element MUST be available to the structured type represented by the containing
INSTANCE
. -
The reference MUST identify a
FIELD
in aTABLE
. TheFIELD
is guaranteed to be unique by the XML/VOTable rules forID
s. -
The Attribute MUST have an atomic datai type or an
ivoa:Quantity
as described in [@sec:quantity]. -
The
@datatype
of the referencedFIELD
SHOULD be compatible with the declared datatype of the Attribute. E.g. an integer (ivoa:integer) Attribute SHOULD be represented by VOTable’s int or long. (TODO OL: what is a client supposed to do if it is not?)
Restrictions
- The Attribute’s datatype MUST identify an atomic type, i.e.
a PrimitiveType or Enumeration, or an
ivoa:Quantity
as described in [@sec:quantity] - The PARAM
@datatype
SHOULD be a compatible, valid serialization type for the datatype of the Attribute.
(TODO somebody thinks we should not even support this. So should we?)
Restrictions
-
The Attribute must be available to the structured type represented by the containing
INSTANCE
. -
The
PARAM
referencd by the reference MUST obey the restrictions defined for the annotation of a PARAM by the Attribute’s type in {@sec:attribute-value}.
(TODO Should we support references to INSTANCEs as DataType attributes?)
A Reference is a relation between a structured type (the “referrer”, an ObjectType or DataType) and an ObjectType, the “target object”, or “referenced object”. The reference is a property defined on the referrer; it is one of the roles an ObjectType can play in the definition of another structured type. Reference is a many-to-1 relation, many referrers can reference the same target object. The implication of a Reference relation is that the referenced object is somehow “used” by the referrer. Often, especially when the referrer is a DataType, it defines reference data that helps interpret the values its attributes assume. See the VO-DML document [@vodml] for more information.
In explicit serialization languages (TODO must define and discuss this term earlier in this document, say in section 3) such as XML Schema or the relational model there is support for representing this relation. E.g. XML schema allows one to define an attribute of type ID, which must have a value unique in the document. Another element can “refer” to this element using an attribute of type IDREF. It does so by assigning a value to the IDERF attribute and that same value MUST have been assigned to the ID attribute of another element for the document to be valid. This equality of values implies the relationship between the two elements, but there is in the definition no restriction on the type of the referenced element.
If a restriction on the referenced element is desired, XML Schema also allows one to define a “key/keyref” combination. A key has a unique name and is defined using an XPath (TODO REF) “selector”, identifying a set of elements in a document and one or more fields that define the unique values distinguishing between these elements. A keyref can similarly be defined by the name of the target key, a selector identifying the referrer elements and the fields used for identifying the specific referenced elements among the ones defined by the key.
In a relational database a column on one table can be defined as a (primary) key, its values must be unique in that table3. Another table (or even the same one) can declare a column to be a “foreign key”, which “references” the first table through its key column.
In all three cases the reference relationship is implemented using pairs of elements which take up the same value, the ID or key, being referenced using the IDREF and keyref/foreign key column(s). These elements can be seen as addresses and pointers, the way by which similar relationships can be made in programming languages, sometimes explicit, as in C/C++, sometimes implicit as in Java/C#.
The mapping prescription laid out in this specification follows the same approach. It identifies three different patterns for representing references between instances of Types. Their difference lies in the way the target object is represented and identified, which puts demands on the way to identify them on the referrer objects. If the target object is serialized in the same document as the referrer it may be either represented as an individual object represented by an INSTANCE or an object in a table.
The specification also allows for the case where the target object is not represented in the same VOTable document as the referrer. It may exist in another VOTable, or in some alternative, possibly standardized serialized form. An example of the latter could be an XML document following a standardized XML schema explicitly defined to represent the data model, or possibly a location in some publicly accessible database with a relational schema mapped to the model in a standardized Object-Relational manner.
This remote reference pattern is potentially very useful. By locating certain standard, reference data objects in some well-defined place, possibly a registry, different documents can reference the same instance in an explicit and interoperable manner. In this case, instead of providing a serialization of (part of) the referenced object with every file that uses it, a reference to the external serialization could be sufficient.
As an example consider defining a set of standard space-time coordinate reference frames and registering these in an IVOA sanctioned registry, with a well-defined and unique IVOA Identifier. Client tools could be coded to understand references to such standard data sets, so that their identifier might suffice to express the reference. Alternatively they might cache such globally persistent instances so to minimize de-referencing. Proper support for such standard reference data requires some standardized form for their registration and storage, an effort that should be part of the creation of standard data models and should have maintenance responsibilities similar to that of the UCD vocabulary [@ucd].
Restrictions
-
The Reference MUST be available on the StructuredType identified by the referred instance.
-
The OjectType identified MUST be compatible with the datatype of the Reference, i.e. must either be the same type or one of its subtypes.
This pattern indicates that each row in a TABLE contains (at least) two objects S and T, and that S references T through the reference.
Reference to object(s) in a TABLE II: objects in different rows, possibly different TABLE ### { #sec:reference-table-ii }
This is the most complex pattern in this specification. It is the analogue of a foreign key constraint in a relational database. The pattern represents a reference R, contained in a source object S representing a structured type, referencing a target instance T. S and T may be contained in the same TABLE, but in general this need not be the case.
This pattern indicates that each row in the TABLE contains a corresponding structured object, which has a reference to an object represented by T, located in some row in its TABLE. The reference is represented by R.
This pattern identifies a reference to an object serialized elsewhere. Such an instance MUST be identified using an IVOA Identifier [@ivoid] if it is stored in an IVOA Registry. However, a URL might be provided as a convenient hook for the client.
Here it is assumed that the DM working group maintains a set of data model instance documents for various instruments and stores these in some registry.
A Composition is defined in VO-DML as a “whole-parts” relation between parent and child ObjectTypes. The child can be interpreted as defining a part of the parent, the parent is composed of the child objects. Generally a parent can have a collection of 0..N children of a certain type, where N may be unbounded, but the multiplicity can be more constrained. The relation between the child and its parent is much stronger than the relation between a source and a target in a Reference relation. In particular the child’s life-cycle is tightly coupled to its parent: a child only has one parent and when the parent is deleted, so is the child. Because of this also the serialization of the relationship may in many cases be more direct than the indirect reference mapping patterns of the previous section.
For example, in faithful XML serializations of a data model one will
generally map child objects as elements directly contained by the parent
element. It is possible to do so in VOTable as well, having a child
object represented by an INSTANCE
that is contained within the INSTANCE
representing the parent. This pattern is the most natural for this
relationship because a collection element is really to be considered as
a part of the parent object.
This containment of INSTANCE
s can be used for individual objects in an
obvious manner. But it may also be used to represent a flattening of the
parent-child relation in a TABLE
. One or more child objects may be
stored together with the parent object in the same row in a TABLE
. Such
a case is actually very common, for example when interpreting tables in
typical source catalogues. These generally contain information of a
source together with one or more magnitudes. The latter can be seen as
elements from a collection of photometry points contained by the
sources. (See the sample data model).
Faithful object-relational mapping of composition relations does generally not allow for such a containment. The typical pattern is that the table representing the child ObjectType has a foreign key to the table representing the parent. This specification allows for this mapping pattern as well, using the ORMReference pattern from section [@sec:reference-table-ii].
INSTANCE
C represents a child object in a Collection on some
ObjectTupe identified by its Role, and of DataType
possibly identified by its Type. In this pattern C is contained
inside of an INSTANCE
P that represents an ObjectType that is a valid
Container of the Collection.
No statement is made about the location of the parent INSTANCE
. Both parent and
child may represent individual, global objects. If their instances refer to
TABLE
s, each row stores both parent and child objects.
Note also that there may be multiple child INSTANCE
s identifying the same
composition relation in the same parent INSTANCE
.
Every instance of an ObjectType inherits from its implicit ultimate super type vodml-map:ObjectTypeInstance its vodml-map:ObjectTypeInstance.container property. In the vodml-map model this is represented as a collection of type vodml-map:Reference, with multiplicity 0..1. This should be interpreted as the possibility of adding on a serialized ObjectType instance a reference to its container/parent object.
In the case of serializing to a VOTable, this allows one to implement an object relational mapping pattern for the parent-child relation as described in the introduction to this section.
The INSTANCE
CH represents a collection of child objects that are contained in
parent objects represented by INSTANCE
P. CH and P are generally stored in
different tables. CH references P following the ORM reference pattern described
in section [@sec:reference-table-ii] above.
The inheritance relation between two Type-s might seem not to be relevant for serializations and mapping. After all the relation implies no relation between objects, the target of serializations, only between types. In serializations, any inherited Role can appear on a sub-type and, as described above, all properties inherited from a super-type are identified with the vodml-id defined on the super-type.
There are some special cases though where inheritance makes a non-trivial appearance. The first we have seen already, in that we can explicitly cast a Role’s value to a sub-type of a declared datatype. See [@sec:clients] for a discussion on how different types of clients can deal with this.
There are however subtler issues related to inheritance mapping. An example comes from a typical object-relational mapping pattern for inheritance hierarchies. When creating a relational model for an object model with inheritance hierarchies, usually there are three patterns people will follow.
One may choose to map each concrete (i.e. non-abstract) type to its own table. But one may also choose to map all sup-types of a given (ultimate) super type to a single table. Different rows may store different sub-types, and certain columns defined for storing an attribute defined on one sub-type will in general be NULL on its siblings. Or one may choose a mixture of these where there is a table for each type, but these only store elements defined on that type; sub-types have a foreign key to their parent and to retrieve a complete subtype this pointer must be followed, possibly recursively until the ultimate super-type. Tools such as Hibernate [@orm] provide explicit support for these inheritance mapping patterns, and we point the reader to their documentation for more information.
In the latter two patterns there will generally be a table, either the single one storing the whole hierarchy, or the one representing the ultimate super-type, which defines a special column that allows one to decide which concrete sub-type is stored in a particular row. This may be through some pre-agreed code, or by storing the complete name of the type.
This pattern can be supported as well.
(TODO The above description assumes one wants to serialize instances of
multiple types in the same table or set of tables. Do we have a use case for
this? If this is not required, this section should just highlight that
the vodml-ref
s for an instance of a subtype are inherited from the
supertype.)
The extending type inherits all of the parent type’s Attributes and their vodml-refs (including the prefixes), and adds its own Attributes and their vodml-refs (including the prefixes).
The extending type also inherits all of the parent’s ancestors, recursively.
(TODO This section is a placeholder for a potential mechanism similar to what we had for value, unit, and ucd in the original document. We might decide to have an entirely different mechanism and remove this section.)
VOTable is expressive enough to allow the mapping patterns described in
this specification. Other formats (notably FITS) cannot support such
annotations [@fits]. A
solution to this, which is even one of the motivations behind the
VOTable specification, is to annotate other format’s tables with a
VOTable wrapper. VOTable (see [@votable], section 5.2) already supports
wrapping FITS tables using a TABLE
/DATA
/FITS
. This has a STREAM/@href
URL to specify which FITS file is represented and an @extnum
attribute
to indicate which FITS extension is used.
Some FITS formats also allow a VOTable header to be included in an HDU. Whilst this would allow data and meta-data to be combined in a single file, this solution seems to be less portable, since some FITS readers do not support these files.
A similar solution could be used to annotate the tables in the schema of
a TAP service. Here no specific support yet exists, but could be easily
added to the TAP specification. I fact a VOTable representing all TAP
tables as “empty” TABLE
elements with only header, no DATA
information
has been proposed in the past as one of the ways by which a TAP schema’s
metadata could be declared. If this were to be a formalized option it
would automatically allow the mapping patterns from the current
specification to be included. The main challenge for TAP is whether that
annotation can be carried over to the results of queries against its
schema. For the simple, column-only annotations such as UCDs or
descriptions, this is already a non-trivial task, and not always
possible4. General ADQL queries can easily unravel the GROUP
-ing of
columns and their nesting and produce results that cannot be annotated
with concepts of the original data models. However we believe this is a
90-10 problem, and for the most common queries it should be possible to
identify the data model concepts with not much more work than is already
required to carry along the UCDs.
This should in particular not be a problem for the results of standard protocols such as the simple cone search. There the service provider is in charge of designing the result set, which, being tabular, can be easily annotated using the current specification. In fact, in the absence of a TAP_SCHEMA-like mechanism, it could be argued that the current specification presents a perfect mechanism by which an SCS service can declare the contents of its holdings.
Whereas similar approaches can likely be used for other tabular formats, an interesting question is whether mapping patterns can be defined also for annotating serialization formats other than tables. In particular the more structured formats such as XML or JSON, or more modern ones such as Google Protocol Buffers5 or Apache Avro6, would pose different problems.
The first approach would be to see if and how such formats can be used to provide faithful representations as defined in section 3. For XML it has been shown in a similar effort (see [@simdm] and [@vourp]) that one can derive an XML schema that represents the types and relations of a well-defined data model in a 1-1 fashion. It is even possible to do so in an automated manner using code generation and we believe it would be a useful effort to define such . As long as a well-defined meta-model exists for the target serialization, and as long as it is rich enough we believe this will carry over to other serialization formats.
The problem is when one want to annotate existing XML documents that have not been designed as such. It has as yet not been explore whether a simple annotation mechanism such as the current one will work there as well. We believe that if the concepts of a data model can be identified in an informal manner, it may be possible to at least partially reproduce the target serialization as a view on a faithful serialization. For example one might define an XSLT document that, when working on a faithfully serialized data model would produce the desired one.
More useful would likely be the opposite, i.e. a document that would produce a faithful serialization of a non-faithful one. Such a document would allow clients to first transform the document to a known serialization for further processing. This brings one close to the global-as-view approach, in that the global schema, the data model, is represented as a view on the source.
More discussion will be needed to address these issues.
(TODO OL: The most important thing to stress here imo is that defining new
mappings is equivalent to writing software drivers: documents similar to this
one will make it possible to implement standardized mappings from vodml to other
metamodels. We will most likely have some json
and yaml
examples. We should
try and ask somebody to explore RDF
mappings, which would be particularly
useful and interesting. In the end VOTable is the current standard for IVOA, but
by no means one of the standards in the community. Having mechanisms for
faithfully roundtrip between different formats is an important part of this
specification.)
Footnotes
-
E.g. when interpreting an
<INSTANCE>
as a certain ObjectType, if one of its children is not annotated or identifies a child element that is not available on the type, this is an error. Fr each pattern there is a set of rules that, if broken, are annotation errors. (TODO we better strive to make these comprehensive.) ↩ -
ivoa types that do not exist in VOTable may there be represented using appropriate xtypes, e.g. adql:timestamp for ivoa:datetime. ↩
-
One can use multiple columns, that together must be unique (and NOT NULL!). ↩
-
E.g. consider SELECT a-b FROM foo. Even if columns a and b have a UCD, what is the UCD of their difference? ↩