diff --git a/schemas/test-catalog.rnc b/schemas/test-catalog.rnc
index bcbc7b4c..aa19b515 100644
--- a/schemas/test-catalog.rnc
+++ b/schemas/test-catalog.rnc
@@ -3,520 +3,697 @@ namespace unqualified = ""
grammar {
- # RNC grammar for test catalog.
- #
- # Revisions:
- # 2023-03-13 : CMSMcQ : Add metadata for dependencies (and correct typos)
- # 2022-05-31 : CMSMcQ : Make 'error' attribute obligatory,
- # add 'wrong-error' as result.
- # 2022-04-12 : CMSMcQ : Move base version of this to ixml repo
- # 2022-04-11 : CMSMcQ : Add dynamic-error as expected result
- # 2022-02-14 : CMSMcQ : Move metadata from attributes to elements
- # 2022-02-06 : CMSMcQ : Add a quick and dirty report format.
- # 2021-12-22 : CMSMcQ : Make 'created' optional on individual tests;
- # notionally, let it be inherited from test set.
- # 2021-11-11 : CMSMcQ : Revamp result to allow multiple results
- # and include assert-not-a-grammar.
- # Rewrite some comments.
- # 2021-10-31 : CMSMcQ : Commit some changes: @name on test-case,
- # allow at most one grammar for each test
- # set (grammars may be inherited from ancestor
- # test sets).
- # 2021-01-25 : CMSMcQ : Sketch this out by hand.
- #
- # To do:
- # - rewrite test-set to allow test cases only if a grammar is
- # specified on the test-set or some ancestor.
- # - allow description to be (p+ | xhtml:div+) HTML
- # - supply types for tokenized attributes?
- #
-# Notational convention: definitions starting in uppercase (e.g.
-# Metadata, Grammar-spec) are for content-model expressions.
-# Definitions starting in lowercase (e.g. test-catalog) are for
-# individual elements, usually with the same name as the element.
-# (Exception: element test-set has two definitions, test-set-0
-# and test-set-1.)
-# The normal starting points are test-catalog and test-report.
-# But to allow individual test sets and tests to be reported
-# separately, we also allow lower-level result elements as the
-# start symbol.
- start = test-catalog | test-report
- | test-set-results | grammar-result | test-result
-# test-catalog, test-report
- # A test catalog is a collection of test sets, with common
- # metadata.
- test-catalog = element test-catalog {
- attribute name { text },
- attribute release-date { xsd:date },
- external-atts,
- (Metadata
- &
- (test-set-0 | test-set-ref)*)
- }
- # A test report is a collection of test set reports, with common
- # metadata.
- test-report = element test-report {
- element metadata {
- (element name { text },
- element report-date { xsd:date | xsd:dateTime },
- element processor { text },
- element processor-version { text }?,
- element catalog-uri { text },
- element catalog-date { text }?)
- &
- Metadata
- },
- external-atts,
- (Metadata
- &
- test-set-results*)
- }
-# Metadata
- # At various levels we allow metadata: prose descriptions,
- # pointers to external documentation, or arbitrary XML
- # elements ('application-specific information'), and
- # miscellaneous technical details about dependencies of a test
- # (or, usually, of the test result) and for a test result the
- # environment within which a test was run.
- Metadata = (description | app-info | doc | dependencies)*
- # The 'description' element contains a prose description.
- # Say what you think needs saying.
- description = element description {
- external-atts,
- p*
- }
- # The 'doc' element carries an 'href' attribute pointing to
- # relevant external documentation.
- doc = element doc {
- external-atts,
- attribute href { xsd:anyURI }
- }
- # The 'app-info' is an escape hatch which can contain any XML
- # at all. It can be used for processor-specific information.
- # (Please document what you do!)
- app-info = element app-info {
- external-atts,
- any-element*
- }
- # The 'options' element (in the test-catalog namespace, but
- # allowed only within app-info) is used to mark results which
- # depend (for a given processor) on the options with which the
- # processor was invoked. Options are assumed describable with
- # name/value pairs encoded as namespace-qualified attributes.
- # Typically the attribute name names the option, and the value
- # says how to set it. Examples and some discussion are in
- # ../tests/grammar-misc/test-catalog.xml
- # If all the option/setting pairs on any options element in
- # the app-info element apply, then any of the results
- # specified in that app-info element is acceptable.
- # So: for both the options elements and the results in the
- # app-info there is an implicit disjunction: if any of the
- # options elements applies, then any of the results is OK.
- # For the various name/value pairs on an options element,
- # there is an implicit conjunction: the options element
- # applies if ALL of the name/value pairs apply.
- # N.B. The options element, and the method of handling options
- # it represents, is to be regarded as experimental.
- options = element options {
- external-atts,
- empty
- }
- # The environment element works much the same way as the
- # options element; when results reported for a test depend on
- # the environment (e.g. which version of Java is used, or
- # which browser an in-browser processor uses, or ...), then
- # the relevant information should be given on an 'environment'
- # element wrapped in an 'app-info' element at the appropriate
- # level of the test results. (Top level if applicable to all,
- # test set if applicable only to that test set, test result if
- # applicable to that result.)
- # The difference between options and environment is that
- # options are assumed to be settable at parse time by whoever
- # calls the ixml processor, and the environment is less likely
- # to be settable that way. In case of gray areas, explain
- # your usage in the test catalog.
- environment = element environment {
- external-atts,
- empty
- }
- # The difference between options and environment is that
- # options are assumed to be settable at parse time by whoever
- # calls the ixml processor, and the environment is less likely
- # to be settable that way. In case of gray areas, explain
- # your usage in the test catalog.
- # The 'dependencies' element identifies conditions that must
- # hold for the results given for a test to hold. Like
- # 'options' and 'environment', it allows an arbitrary set of
- # name/value pairs (namespace-qualified attributes). If all
- # of them apply, the test result given is applicable.
- # Some dependencies are standardized: any processor must
- # conform to some version of Unicode but we don't specify which,
- # so the processor must specify. Test results must be labeled
- # with the appropriate Unicode version(s).
- dependencies = element dependencies {
- attribute Unicode-version { text },
- external-atts,
- empty
- }
- # The differences are:
- # options - implementation-defined, typically settable by caller
- # at parse time. Wrap in app-info to label results
- # (often non-standard) which depend on how the
- # processor was invoked.
- # environment - relevant but not under implementation control.
- # Wrap in app-info, use to label results which depend
- # on the environment within which the processor is
- # running (or within which a test result was obtained).
- # dependencies - used to label test cases whose results
- # depend on which version of another spec is applicable.
-# test-set, test-set-results
- # A test set is a collection of tests (or possibly subordinate
- # test sets, or both) with common metadata and a common
- # grammar.
- # Test cases are allowed only after a grammar is specified.
- # We keep track of whether an ancestor has specified a grammar
- # by having two nonterminals for test sets: test-set-0 is used
- # when no ancestor has specified a grammar, test-set-1 when
- # at least one grammar has been specified.
- # If no ancestor has specified a grammar, test cases are allowed
- # in this test set only if this test set does specify a grammar.
- # Use test-set-0 or -1 to pass the news along.
- test-set-0 = element test-set {
- attribute name { text },
- external-atts,
- (Metadata
- &
- (History,
- ( (test-set-0 | test-set-ref)*
- | (Grammar-spec, (test-set-1 | test-set-ref | test-case)*) )))
- }
- # If an ancestor has specified a grammar, test cases are allowed
- # in this test set even if there is no grammar at this level.
- test-set-1 = element test-set {
- attribute name { text },
- external-atts,
- (Metadata
- &
- (History,
- Grammar-spec?,
- (test-set-1 | test-set-ref | test-case)*))
- }
- test-set-results = element test-set-results {
- attribute name { text },
- external-atts,
- (Metadata
- &
- (Grammar-results?,
- (test-set-results | test-result)*))
- }
- # Grammars can be in invisible XML or in visible XML.
- # They can be inline or external. They can be marked
- # as a grammar test or not.
- Grammar-data = (ixml-grammar
- | vxml-grammar
- | ixml-grammar-ref
- | vxml-grammar-ref)
- Grammar-spec = (Grammar-data, grammar-test?)
- # In the results file, we may omit the grammar, or include
- # it, possibly both reproducing the reference and giving
- # the grammar inline.
- Grammar-results = (Grammar-data*, grammar-result*)
- # Q. Why is the grammar optional?
- # A. Because in a nested test set we may want to inherit the
- # grammar from the parent test set. In a top-level test
- # set with no direct test-case children, we may just be
- # pointing to multiple test sets which each provide their
- # own grammar. By the time we reach a test case we must
- # have at least one grammar, but we don't need on at every
- # level.
- # Q. Why can't there be multiple grammars?
- # A. First, it's error prone: it would work only if all of them
- # were guaranteed equivalent. We don't want to have to check
- # that, and we don't want the mess that will result if it
- # turns out not to be true. Second, it complicates reporting
- # unnecessarily. It's simpler when one test case is one
- # grammar + input + result triple.
- test-set-ref = element test-set-ref {
- external-atts,
- attribute href { xsd:anyURI }
- }
- # ixml-grammar: grammar in invisible-XML form
- ixml-grammar = element ixml-grammar {
- external-atts,
- text
- }
- ixml-grammar-ref = element ixml-grammar-ref {
- external-atts,
- attribute href { xsd:anyURI }
- }
- # vxml-grammar: grammar in visible-XML form (either a parsed
- # ixml grammar, translated into XML, or something created in
- # XML)
- #
- # N.B. It is tempting to embed a schema for ixml grammars here
- # to enforce the correct XML form. But we do not require a
- # legal ixml grammar, because it may be a negative test case.
- vxml-grammar = element vxml-grammar {
- external-atts,
- any-element
- }
- vxml-grammar-ref = element vxml-grammar-ref {
- external-atts,
- attribute href { xsd:anyURI }
- }
- # grammar-test: signals that this grammar should be checked
- # and either accepted or declined as a grammar.
- grammar-test = element grammar-test {
- external-atts,
- (Metadata & (History?, result))
- }
- grammar-result = element grammar-result {
- attribute result { result-type },
- external-atts,
- (Metadata & (result-report?))
- }
-# test-case
- # test-case: describes one test case, with metadata, history,
- # and expected result.
- test-case = element test-case {
- attribute name { text },
- external-atts,
- (Metadata & (History?, Test-string, result))
- }
- test-result = element test-result {
- attribute name { text },
- attribute result { result-type },
- external-atts,
- (Metadata &
- (Grammar-data*, (Test-string)*, result-report?)
- )
- }
- result-type = 'pass' # results are as expected
- | 'fail' # results not as expected
- | 'wrong-error' # right overall result, wrong error code
- | 'wrong-state' # right overall result, wrong ixml:state value(s)
- | 'not-run'
- | 'other'
- # Test-string: in-line or external
- Test-string = (test-string | test-string-ref)
- test-string = element test-string {
- external-atts,
- text
- }
- test-string-ref = element test-string-ref {
- external-atts,
- attribute href { xsd:anyURI }
- }
-# result
- # result: specifies the expected result of a test;
- # contains an assertion of some kind.
- result = element result {
- external-atts,
- Assertion
- }
- result-report = element result {
- external-atts,
- Assertion?,
- Observation?
- }
-# Test assertions
- # Several kinds of result are possible.
- #
- # - In the common case we will have one expected XML result. We
- # specify it with assert-xml or assert-xml-ref (inline or
- # external).
- #
- # - For ambiguous sentences, we may and should specify several
- # XML results, any of which is acceptable. So the XML
- # assertions can repeat, with an implicit OR as their meaning.
- #
- # - In the case of infinite ambiguity, we can and should specify
- # a finite subset of the expected results, which we add to as
- # needed.
- #
- # - If the input is not be a sentence in the language defined
- # by the grammar, we use assert-not-a-sentence.
- #
- # - If the grammar specified is not a conforming ixml grammar,
- # then we use assert-not-a-grammar.
- #
- # - If the particular grammar + input pair would produce
- # ill-formed output if the normal rules were followed, then
- # we use assert-dynamic-error.
- #
- # Logically speaking, in the case of a grammar-test, there is no
- # useful distinction between assert-not-a-sentence and
- # assert-not-a-grammar. Casuists can argue over which makes
- # more sense, but in practice they should be treated as
- # equivalent. They are usefully different only for normal
- # test cases.
- #
- # Since dynamic errors are allowed to be caught statically,
- # some processors may return assert-not-a-grammar when the test
- # catalog expects assert-dynamic-error.
- #
- # Errors in the grammar and dynamic errors may be associated
- # with error codes. These are now required.
- Assertion = ((assert-xml-ref | assert-xml)+
- | assert-not-a-sentence
- | assert-not-a-grammar
- | assert-dynamic-error)
- Error-Code = attribute error-code { text }
- assert-xml-ref = element assert-xml-ref {
- external-atts,
- attribute href { xsd:anyURI }
- }
- assert-xml = element assert-xml {
- external-atts,
- any-element+
- }
- assert-not-a-sentence = element assert-not-a-sentence {
- external-atts,
- Metadata
- }
- assert-not-a-grammar = element assert-not-a-grammar {
- Error-Code,
- external-atts,
- Metadata
- }
- assert-dynamic-error = element assert-dynamic-error {
- Error-Code,
- external-atts,
- Metadata
- }
- Observation = ((reported-xml-ref | reported-xml)+
- | reported-not-a-sentence
- | reported-not-a-grammar
- | reported-dynamic-error)
+ # RNC grammar for test catalog.
+ #
+ # Revisions:
+ # 2024-05-01 : CMSMcQ : re-indent, use div for grouping, add
+ # double-hash comments for elements and
+ # important patterns
+ # 2023-03-13 : CMSMcQ : Add metadata for dependencies (and
+ # correct typos)
+ # 2022-05-31 : CMSMcQ : Make 'error' attribute obligatory,
+ # add 'wrong-error' as result.
+ # 2022-04-12 : CMSMcQ : Move base version of this to ixml
+ # repo
+ # 2022-04-11 : CMSMcQ : Add dynamic-error as expected result
+ # 2022-02-14 : CMSMcQ : Move metadata from attributes to
+ # elements
+ # 2022-02-06 : CMSMcQ : Add a quick and dirty report format.
+ # 2021-12-22 : CMSMcQ : Make 'created' optional on individual
+ # tests; notionally, let it be
+ # inherited from test set.
+ # 2021-11-11 : CMSMcQ : Revamp result to allow multiple
+ # results and include
+ # assert-not-a-grammar. Rewrite some
+ # comments.
+ # 2021-10-31 : CMSMcQ : Commit some changes: @name on
+ # test-case, allow at most one grammar
+ # for each test set (grammars may be
+ # inherited from ancestor test sets).
+ # 2021-01-25 : CMSMcQ : Sketch this out by hand.
+ #
+ # To do:
+ # - allow description to be (p+ | xhtml:div+) HTML
+ # - supply types for tokenized attributes?
+ #
+ # Notational convention: definitions starting in uppercase
+ # (e.g. Metadata, Grammar-spec) are for content-model
+ # expressions. Definitions starting in lowercase
+ # (e.g. test-catalog) are for individual elements, usually
+ # with the same name as the element.
+ #
+ # (Exception: element test-set has two definitions,
+ # test-set-0 and test-set-1.)
+ # The normal starting points are test-catalog and
+ # test-report. But to allow individual test sets and tests
+ # to be reported separately, we also allow lower-level result
+ # elements as the start symbol: test-set-results,
+ # grammar-rule, test-result.
+ start = test-catalog | test-report
+ | test-set-results | grammar-result | test-result
+ div {
+ # test-catalog, test-report
+ ## test-catalog: A test catalog is a collection of test
+ ## sets, with common metadata.
+ test-catalog = element test-catalog {
+ attribute name { text },
+ attribute release-date { xsd:date },
+ external-atts,
+ (Metadata
+ &
+ (test-set-0 | test-set-ref)*)
+ }
+ ## test-report: A test report is a collection of test set
+ ## reports, with common metadata.
+ test-report = element test-report {
+ element metadata {
+ (element name { text },
+ element report-date {
+ xsd:date | xsd:dateTime
+ },
+ element processor { text },
+ element processor-version { text }?,
+ element catalog-uri { text },
+ element catalog-date { text }?)
+ &
+ Metadata
+ },
+ external-atts,
+ (Metadata
+ &
+ test-set-results*)
+ }
+ }
+ div {
+ # Metadata
+ # At various levels we allow metadata: prose descriptions,
+ # pointers to external documentation, or arbitrary XML
+ # elements ('application-specific information'), and
+ # miscellaneous technical details about dependencies of a
+ # test (or, usually, of the test result) and for a test
+ # result the environment within which a test was run.
+ ## Metadata: descriptions, documentation, dependencies,
+ ## or application-specific information
+ Metadata = (description | app-info | doc | dependencies)*
+ ## description: a prose description of the item.
+ ## Say what you think needs saying.
+ description = element description {
+ external-atts,
+ p*
+ }
+ ## doc: pointer to documentation relevant to the item.
+ ## The 'href' attribute gives the URI.
+ doc = element doc {
+ external-atts,
+ attribute href { xsd:anyURI }
+ }
+ ## app-info: The 'app-info' element is an escape hatch which
+ ## can contain any XML at all. It can be used for
+ ## processor-specific information. (Please document what you
+ ## do!)
+ app-info = element app-info {
+ external-atts,
+ any-element*
+ }
+ ## options: The 'options' element is embedded within app-info
+ ## to mark results which depend (for a given processor) on
+ ## the options with which the processor was invoked.
+ options = element options {
+ external-atts,
+ empty
+ # N.B. The 'options' element is in the
+ # test-catalog namespace, but it is allowed
+ # only within app-info.
+ # Options are assumed describable with
+ # name/value pairs encoded as
+ # namespace-qualified attributes. Typically
+ # the attribute name names the option, and the
+ # value says how to set it.
+ # Examples and some discussion are in
+ # ../tests/grammar-misc/test-catalog.xml
+ # If all the option/setting pairs on any
+ # options element in the app-info element
+ # apply, then any of the results specified in
+ # that app-info element is acceptable.
+ # So: for both the options elements and the
+ # results in the app-info there is an implicit
+ # disjunction: if any of the options elements
+ # applies, then any of the results is OK. For
+ # the various name/value pairs on an options
+ # element, there is an implicit conjunction:
+ # the options element applies if ALL of the
+ # name/value pairs apply.
+ # N.B. The options element, and the method of
+ # handling options it represents, is to be
+ # regarded as experimental.
+ }
+ ## environment: describes possible dependency of a test result
+ ## on the environment within which the test is run.
+ environment = element environment {
+ external-atts,
+ empty
+ # The 'environment' element works much
+ # the same way as the options element;
+ # when results reported for a test
+ # depend on the environment (e.g. which
+ # version of Java is used, or which
+ # browser an in-browser processor uses,
+ # or ...), then the relevant
+ # information should be given on an
+ # 'environment' element wrapped in an
+ # 'app-info' element at the appropriate
+ # level of the test results. (Top
+ # level if applicable to all, test set
+ # if applicable only to that test set,
+ # test result if applicable to that
+ # result.)
+ }
+ # The difference between options and environment is that
+ # options are assumed to be settable at parse time by whoever
+ # calls the ixml processor, and the environment is less
+ # likely to be settable that way. In case of gray areas,
+ # explain your usage in the test catalog.
+ ## dependencies: identifies conditions that must hold for the
+ ## results given for a test to hold.
+ dependencies = element dependencies {
+ attribute Unicode-version { text },
+ external-atts,
+ empty
+ # Like 'options' and 'environment', it
+ # allows an arbitrary set of
+ # name/value pairs
+ # (namespace-qualified attributes).
+ # If all of them apply, the test
+ # result given is applicable.
+ # Some dependencies are standardized:
+ # any processor must conform to some
+ # version of Unicode but we don't
+ # specify which, so the processor must
+ # specify. Test results must be
+ # labeled with the appropriate Unicode
+ # version(s).
+ }
+ # The differences among these three elements for describing
+ # when a test or a result is relevant are:
+ # options - implementation-defined, typically settable by
+ # caller at parse time. Wrap in app-info to label
+ # results (often non-standard) which depend on how
+ # the processor was invoked.
+ # environment - relevant but not under implementation
+ # control. Wrap in app-info, use to label results
+ # which depend on the environment within which the
+ # processor is running (or within which a test
+ # result was obtained).
+ # dependencies - used to label test cases whose results
+ # depend on which version of another spec is
+ # applicable.
+ }
+ div {
+ # test-set, test-set-results
+ # A test set is a collection of tests (or possibly
+ # subordinate test sets, or both) with common metadata and a
+ # common grammar.
+ # Test cases are allowed only after a grammar is specified.
+ # We keep track of whether an ancestor has specified a
+ # grammar by having two nonterminals for test sets:
+ # test-set-0 is used when no ancestor has specified a
+ # grammar, test-set-1 when at least one grammar has been
+ # specified.
+ # If no ancestor has specified a grammar, test cases are
+ # allowed in this test set only if this test set does specify
+ # a grammar. Use test-set-0 or -1 to pass the news along.
+ ## test-set (pattern test-set-0): a test set with no
+ ## grammar inherited from any ancestor.
+ test-set-0 = element test-set {
+ attribute name { text },
+ external-atts,
+ (Metadata
+ &
+ (History,
+ ( (test-set-0 | test-set-ref)*
+ | (Grammar-spec,
+ (test-set-1
+ | test-set-ref
+ | test-case)*) )))
+ }
+ # If an ancestor has specified a grammar, test cases are allowed
+ # in this test set even if there is no grammar at this level.
+ ## test-set (pattern test-set-1): a test set with a grammar
+ ## inherited from an ancestor.
+ test-set-1 = element test-set {
+ attribute name { text },
+ external-atts,
+ (Metadata
+ &
+ (History,
+ Grammar-spec?,
+ (test-set-1 | test-set-ref | test-case)*))
+ }
+ ## test-set-ref: a reference to a test set located in
+ ## another test catalog; the 'href' attribute gives the URI.
+ test-set-ref = element test-set-ref {
+ external-atts,
+ attribute href { xsd:anyURI }
+ }
+ ## test-set-results: contains reports of results from running
+ ## the test cases of a given test set.
+ test-set-results = element test-set-results {
+ attribute name { text },
+ external-atts,
+ (Metadata
+ &
+ (Grammar-results?,
+ (test-set-results | test-result)*))
+ }
+ }
+ div {
+ # Specifying the grammar for a set of tests
+ # Grammars can be in invisible XML or in visible XML. They
+ # can be inline or external. They can be marked as a grammar
+ # test or not.
+ ## Grammar-data: four ways to specify the grammar for a test
+ ## set.
+ Grammar-data = (ixml-grammar
+ | vxml-grammar
+ | ixml-grammar-ref
+ | vxml-grammar-ref)
+ ## Grammar-spec: specification of the grammar for a test set,
+ ## optionally treating the grammar itself as a test case to
+ ## be parsed against the specification grammar.
+ Grammar-spec = (Grammar-data, grammar-test?)
+ # In the results file, we may omit the grammar, or include
+ # it, possibly both reproducing the reference and giving
+ # the grammar inline.
+ ## Grammar-results: optional reproduction of the grammar used
+ ## for the test set, and reports of any grammar tests.
+ Grammar-results = (Grammar-data*, grammar-result*)
+ # Q. Why is the grammar optional?
+ # A. Because in a nested test set we may want to inherit the
+ # grammar from the parent test set. In a top-level test
+ # set with no direct test-case children, we may just be
+ # pointing to multiple test sets which each provide their
+ # own grammar. By the time we reach a test case we must
+ # have at least one grammar, but we don't need one at
+ # every level.
+ # Q. Why can't there be multiple grammars?
+ # A. First, it's error prone: it would work only if all of
+ # them were guaranteed equivalent. We don't want to have
+ # to check that, and we don't want the mess that will
+ # result if it turns out not to be true. Second, it
+ # complicates reporting unnecessarily. It's simpler when
+ # one test case is one grammar + input + result triple.
+ ## ixml-grammar: a grammar in invisible-XML form, given
+ ## inline in the test catalog.
+ ixml-grammar = element ixml-grammar {
+ external-atts,
+ text
+ }
+ ## ixml-grammar-ref: a reference to a grammar in
+ ## invisible-XML form located elsewhere. The 'href'
+ ## attribute says where.
+ ixml-grammar-ref = element ixml-grammar-ref {
+ external-atts,
+ attribute href { xsd:anyURI }
+ }
+ ## vxml-grammar: grammar in visible-XML form (either a parsed
+ ## ixml grammar, translated into XML, or something created in
+ ## XML), given inline in the test catalog.
+ vxml-grammar = element vxml-grammar {
+ external-atts,
+ any-element
+ }
+ # N.B. It is tempting to embed a schema for ixml grammars here
+ # to enforce the correct XML form. But we do not require a
+ # legal ixml grammar, because it may be a negative test case.
+ ## vxml-grammar-ref: reference to a grammar in visible-XML
+ ## form (either a parsed ixml grammar, translated into XML,
+ ## or something created in XML), given elsewhere (as
+ ## indicated by the 'href' attribute).
+ vxml-grammar-ref = element vxml-grammar-ref {
+ external-atts,
+ attribute href { xsd:anyURI }
+ }
+ ## grammar-test: signals that this grammar should be checked
+ ## and either accepted or declined as a grammar.
+ grammar-test = element grammar-test {
+ external-atts,
+ (Metadata & (History?, result))
+ }
+ ## grammar-result: reports the result of a grammar test.
+ grammar-result = element grammar-result {
+ attribute result { result-type },
+ external-atts,
+ (Metadata & (result-report?))
+ }
+ }
+ div {
+ # test-case
+ ## test-case: describes one test case, with metadata,
+ ## history, and expected result.
+ test-case = element test-case {
+ attribute name { text },
+ external-atts,
+ (Metadata & (History?, Test-string, result))
+ }
+ ## test-result: reports the result of one test case.
+ test-result = element test-result {
+ attribute name { text },
+ attribute result { result-type },
+ external-atts,
+ (Metadata &
+ (Grammar-data*,
+ (Test-string)*,
+ result-report?)
+ )
+ }
+ ## result-type: keyword description of test result
+ result-type = ## results are as expected
+ 'pass'
+ | ## results not as expected
+ 'fail'
+ | ## right overall result, wrong error code
+ 'wrong-error'
+ | ## right overall result, wrong ixml:state value(s)
+ 'wrong-state'
+ | ## test case was not run (explain!)
+ 'not-run'
+ | ## none of the above
+ 'other'
+ ## Test-string: input string, in-line or external
+ Test-string = (test-string | test-string-ref)
+ ## test-string: this element contains the input string
+ ## for the test case.
+ test-string = element test-string {
+ external-atts,
+ text
+ }
+ ## test-string-ref: this element carries a point to an
+ ## external resource which contains the input string for the
+ ## test case.
+ test-string-ref = element test-string-ref {
+ external-atts,
+ attribute href { xsd:anyURI }
+ }
+ }
+ div {
+ # result
+ ## result: specifies the expected result of a test; contains
+ ## an assertion of some kind.
+ result = element result {
+ external-atts,
+ Assertion
+ }
+ ## result-report: specifies the observed result of running a
+ ## test case. May repeat the assertion describing the
+ ## expected result, and may report what was actually observed
+ ## when the test was run.
+ result-report = element result {
+ external-atts,
+ Assertion?,
+ Observation?
+ }
+ }
+ div {
+ # Test assertions
+ # Several kinds of result are possible.
+ #
+ # - In the common case we will have one expected XML result.
+ # We specify it with assert-xml or assert-xml-ref (inline
+ # or external).
+ #
+ # - For ambiguous sentences, we may and should specify
+ # several XML results, any of which is acceptable. So the
+ # XML assertions can repeat, with an implicit OR as their
+ # meaning.
+ #
+ # - In the case of infinite ambiguity, we can and should
+ # specify a finite subset of the expected results, which we
+ # add to as needed.
+ #
+ # - If the input is not be a sentence in the language defined
+ # by the grammar, we use assert-not-a-sentence.
+ #
+ # - If the grammar specified is not a conforming ixml
+ # grammar, then we use assert-not-a-grammar.
+ #
+ # - If the particular grammar + input pair would produce
+ # ill-formed output if the normal rules were followed, then
+ # we use assert-dynamic-error.
+ #
+ # Logically speaking, in the case of a grammar-test, there is
+ # no useful distinction between assert-not-a-sentence and
+ # assert-not-a-grammar. Casuists can argue over which makes
+ # more sense, but in practice they should be treated as
+ # equivalent. The two assertions are usefully different only
+ # for normal test cases.
+ #
+ # Since dynamic errors are allowed to be caught statically,
+ # some processors may return assert-not-a-grammar when the
+ # test catalog expects assert-dynamic-error.
+ #
+ # Errors in the grammar and dynamic errors may be associated
+ # with error codes. These are now required.
+ ## Assertion: things a catalog can say about an expected test
+ ## result.
+ Assertion = ((assert-xml-ref | assert-xml)+
+ | assert-not-a-sentence
+ | assert-not-a-grammar
+ | assert-dynamic-error)
+ ## Error-Code: an attribute for specifying an error code
+ ## expected for a test case, or observed in a test.
+ Error-Code = attribute error-code { text }
+ ## assert-xml-ref: asserts that the result of the test case
+ ## is expected to match the external XML document pointed to
+ ## by the 'href' attribute.
+ assert-xml-ref = element assert-xml-ref {
+ external-atts,
+ attribute href { xsd:anyURI }
+ }
+ ## assert-xml: asserts that the result of the test case is
+ ## expected to match the XML contained.
+ assert-xml = element assert-xml {
+ external-atts,
+ any-element+
+ }
+ ## assert-not-a-sentence: asserts that the input string is
+ ## not a sentence in the language defined by the input
+ ## grammar.
+ assert-not-a-sentence = element assert-not-a-sentence {
+ external-atts,
+ Metadata
+ }
+ ## assert-not-a-grammar: asserts that the input grammar given
+ ## is not a conforming ixml grammar. This may be because
+ ## it's not a sentence in the language defined by the ixml
+ ## specification grammar, or for other reasons.
+ assert-not-a-grammar = element assert-not-a-grammar {
+ Error-Code,
+ external-atts,
+ Metadata
+ }
+ ## assert-dynamic-error: asserts that when the input grammar
+ ## is parsed against the input grammar and written out as
+ ## XML, a dynamic error is expected to result. Note that
+ ## processors are allowed to detect dynamic errors statically
+ ## and report with a 'reported-not-a-grammar'.
+ assert-dynamic-error = element assert-dynamic-error {
+ Error-Code,
+ external-atts,
+ Metadata
+ }
+ ## Observation: things a test report can say about
+ ## an observed test result.
+ Observation = ((reported-xml-ref | reported-xml)+
+ | reported-not-a-sentence
+ | reported-not-a-grammar
+ | reported-dynamic-error)
+ ## reported-xml-ref: reports that when the test case
+ ## was run, the processor produced the XML document
+ ## pointed to by the 'href' attribute.
+ reported-xml-ref = element reported-xml-ref {
+ external-atts,
+ attribute href { xsd:anyURI }
+ }
+ ## reported-xml: reports that when the test case was run, the
+ ## processor produced the XML output contained in the
+ ## element.
+ reported-xml = element reported-xml {
+ external-atts,
+ any-element+
+ }
+ ## reported-not-a-sentence: reports that when the test case
+ ## was run, the processor reported that parsing failed (i.e.
+ ## that the input string is not a sentence in the language
+ ## defined by the input grammar).
+ reported-not-a-sentence = element reported-not-a-sentence {
+ external-atts,
+ Metadata
+ }
+ ## reported-not-a-grammar: reports that when the test case
+ ## was run, the processor reported that the input grammar
+ ## was not a conforming ixml grammar.
+ ## Note that this may be reported when the processor
+ ## detects that serializing the result would raise a
+ ## dynamic error.
+ reported-not-a-grammar = element reported-not-a-grammar {
+ Error-Code,
+ external-atts,
+ Metadata
+ }
+ ## reported-dynamic-error: reports that when the test case
+ ## was run, the processor reported a dynamic error.
+ reported-dynamic-error = element reported-dynamic-error {
+ Error-Code,
+ external-atts,
+ Metadata
+ }
- reported-xml-ref = element reported-xml-ref {
- external-atts,
- attribute href { xsd:anyURI }
- }
- reported-xml = element reported-xml {
- external-atts,
- any-element+
- }
- reported-not-a-sentence = element reported-not-a-sentence {
- external-atts,
- Metadata
- }
- reported-not-a-grammar = element reported-not-a-grammar {
- Error-Code,
- external-atts,
- Metadata
- }
- reported-dynamic-error = element reported-dynamic-error {
- Error-Code,
- external-atts,
- Metadata
- }
-# Common constructs
- # History: creation and modification history
- History = (created, modified*)
- who-when = attribute by { text },
- attribute on { xsd:date }
- created = element created {
- who-when
- }
- modified = element modified {
- who-when,
- attribute change { text }
- }
- # Elements for simple prose.
- p = element p { phrases }
- phrases = (text | emph | code)*
- emph = element emph { phrases }
- code = element code { text }
- # Arbitrary XML
- anything = (any-element | any-attribute | text)*
- any-element = element * { anything }
- any-attribute = attribute * { text }
- external-atts = nsq-att*
- nsq-att = attribute (* - unqualified:*) { text }
+ }
+ div {
+ # Common constructs
+ ## History: creation and modification history
+ History = (created, modified*)
+ ## who-when: attributes for reporting who did
+ ## something and when they did it.
+ who-when = attribute by { text },
+ attribute on { xsd:date }
+ ## created: reports who created the item (test catalog, test
+ ## set, test case, ...) and when.
+ created = element created {
+ who-when
+ }
+ ## modified: reports who changed the item (test catalog, test
+ ## set, test case, ...) and when.
+ modified = element modified {
+ who-when,
+ attribute change { text }
+ }
+ # Elements for simple prose.
+ ## p: a paragraph of simple prose.
+ p = element p { phrases }
+ ## phrases: possible content of a paragraph.
+ phrases = (text | emph | code)*
+ ## emph: marks a phrased emphasized either rhetorically or
+ ## typographically or both. (Expected rendering: italic.)
+ emph = element emph { phrases }
+ ## code: marks material from a machine-processable language
+ ## of some kind (e.g. a program). (Expected rendering:
+ ## monospaced.)
+ code = element code { text }
+ # Arbitrary XML
+ ## anything: a pattern matching arbitrary XML
+ anything = (any-element | any-attribute | text)*
+ ## any-element: a pattern matching one well-formed XML
+ ## element.
+ any-element = element * { anything }
+ ## any-element: a pattern matching one XML attribute.
+ any-attribute = attribute * { text }
+ ## external-atts: a pattern matchine zero or more
+ ## namespace-qualified attributes.
+ external-atts = nsq-att*
+ ## nsq-att: a pattern matching one namespace-qualified
+ ## attribute.
+ nsq-att = attribute (* - unqualified:*) { text }
+ }
diff --git a/tools/tsd/images/tsd-workflow.dot b/tools/tsd/images/tsd-workflow.dot
new file mode 100644
index 00000000..80b7b282
--- /dev/null
+++ b/tools/tsd/images/tsd-workflow.dot
@@ -0,0 +1,27 @@
+digraph tsd_dfd {
+ // sketch of a data flow for managing tag set documentation
+ subgraph {
+ node [shape=box];
+ rnc [label="RNC schema"];
+ rng [label="RNG\n(auto-generated)"];
+ auto [label="auto-TSD\n(tag-set description\n=Docbook refentry+\nauto-generated)"];
+ manual [label="manual TSD\n(tag-set description\n=Docbook refentry+\neditable)" fontcolor=red];
+ // tsd [label="TSD\n(partly editable)"];
+ tsd [label="TSD\n(tag-set description:\nDocbook refentry+\nprose and auto-generated)"];
+ node [shape=oval];
+ // editrnc [label="edit / regen"];
+ edittsd [label="edit"];
+ manual -> edittsd -> manual;
+ tsdxrng [label="auto-generate TSD"];
+ merge [label="merge\nauto- and manual parts"];
+ rnc -> trang -> rng -> tsdxrng -> auto -> merge -> tsd;
+ manual -> merge;
+ subgraph { rank = same; auto; manual; }
+ }
+ // rnc -> editrnc -> rnc [weight=0];
+ tsd -> manual [style=dotted weight=0 label="Re-use"];
diff --git a/tools/tsd/images/tsd-workflow.dot.png b/tools/tsd/images/tsd-workflow.dot.png
new file mode 100644
index 00000000..6f9cb809
Binary files /dev/null and b/tools/tsd/images/tsd-workflow.dot.png differ
diff --git a/tools/tsd/images/tsd-workflow.dot.svg b/tools/tsd/images/tsd-workflow.dot.svg
new file mode 100644
index 00000000..e990ecce
--- /dev/null
+++ b/tools/tsd/images/tsd-workflow.dot.svg
@@ -0,0 +1,139 @@
diff --git a/tools/tsd/rng-to-TSD.xsl b/tools/tsd/rng-to-TSD.xsl
new file mode 100644
index 00000000..a5dc3422
--- /dev/null
+++ b/tools/tsd/rng-to-TSD.xsl
@@ -0,0 +1,632 @@
+ Tag set documentation for
+ auto-generated from schema by rng-to-TSD.xsl
+ Introduction
+ This is a skeletal framework for documentation of
+ .
It was generated automatically from the
+ schema by rng-to-TSD.xsl,
+ at
+ .
+ Alphabetical list of elements and patterns
+ Alphabetical list of elements
+ Alphabetical list of patterns
+ Tag set documentation for
+ [None. This tag set documentation
+ will not be distributed in this form.]
+ Generated automatically from
+ Document auto-generated from schema by rng-to-TSD.xsl
+ Reference documentation (skeleton)
+ for
+ Introduction
+ This is a skeletal framework for documentation of
+ .
It was generated automatically from the
+ schema by rng-to-TSD.xsl,
+ at
+ .
+ Alphabetical list of elements and patterns
+ Alphabetical list of elements
+ Alphabetical list of patterns
+ Tag set documentation for
+ Reference documentation (skeleton)
+ for
+ Introduction
+ This is a skeletal framework for documentation of
+ .
It was generated automatically from the
+ schema by rng-to-TSD.xsl,
+ at
+ .
+ Alphabetical list of elements and patterns
+ Alphabetical list of elements
+ Alphabetical list of patterns
+ (element)
+ [Description to be supplied.]
+ Remarks
+ ...
+ [Description to be supplied.]
+ ...
+ (element)
+ [Description to be supplied.]
+ watch this space
+ Remarks
+ ...
+ (pattern)
+ Remarks
+ ...
+ ...
+ (pattern)
+ watch this space
+ Remarks
+ ...
diff --git a/tools/tsd/tsd-planning.html b/tools/tsd/tsd-planning.html
new file mode 100644
index 00000000..9bdddaaa
--- /dev/null
+++ b/tools/tsd/tsd-planning.html
@@ -0,0 +1,700 @@
+This document outlines a plan for a workflow to create and maintain
+documentation for the XML vocabulary used for the XML form of ixml
+grammars (here called VXML), and the XML vocabulary used for test
+catalogs by the ixml Community Group.
+In its current form this document is not complete and is binding on
+no one. It is written to serve as a basis for discussion, and to
+record some thoughts and expectations.
1. Project overview
1.1. Primary deliverables
+The central deliverables are reference tag-set documentation (TSD) for
+the XML vocabularies in question.
+The tag-set documentation we wish to create consists of some
+expository prose and a reference pages for the element types and
+widely used attributes.
+The crucial delivery format is XHTML; other XML vocabularies may be
+used for maintenance, but is not expected to be of interest to others.
1.2. Requirements
+Known requirements and desiderata:
It must be possible to update the documentation more or less
+conveniently as the schemas change.
When the schema changes, human-supplied prose must be carried
+forward easily.
Information derivable from the schema should be provided
+automatically. Specifically: declarations, lists of parents,
+lists of children, lists of attributes.
When schema-derived information changes, it is desirable that
+the user be warned, so that any relevant prose can also be
1.3. Workflow
+The intended workflow is described in this diagram:
Figure 1: Workflow plan
+That is:
+The RNC/RNG schemas are maintained independently.
+The test catalog schema is maintained by hand in RNC; the ixml
+schema is generated automatically in RNG from the ixml grammar,
+which is maintained by hand. We use trang to make an RNG form of
+the test catalog schema, and an RNC form of the ixml schema.
+Not shown: we use jing -s to create a 'simplified' version of the
+RNG schema. In some cases, this may require some hand work. (Jing
+aborts with an error message if asked to simplify some schemas with
+recursive patterns. The simplified schema also uses some rather
+opaque names for patterns introduced by Jing.))
+An XSLT stylesheet (rng-to-TSD.xsl) auto-generates tag-set
+documentation for the schema.
+If names and short descriptions are provided in the RNG annotation
+namespace (a:documentation elements), they should be carried over.
+Otherwise, dummies should be provided.
+This stylesheet draws information about the vocabulary from both the
+RNG schema and the simplified RNG schema.
An XSLT stylesheet (tsd-merger.xsl) reads the auto-generated
+documentation and the previous hand-edited version of the same
+documentation, and produces merged output.
For schema-derived information, the auto-generated documentation
+is preferred; for other information (basically: the prose),
+the hand-edited documentation is preferred.
If any schema-derived information differs between the two
+sources, the stylesheet should report the fact to the user.
An XSLT stylesheet (tsd-to-html.xsl) reads the merged tag-set
+documentation and generates HTML with an appropriate stylesheet
1.4. Secondary deliverables
+We have some secondary deliverables, whose purpose is to occupy
+the corresponding positions in the workflow.
Specification of the tag-set documentation vocabulary and
+conventions to be used. Obvious candidates are Docbook, TEI P3, TEI
+P5, and an ad hoc custom vocabulary.
2. Vocabulary for tag-set documentation
+We will use Docbook for the XML form of the tag-set documentation.
+Because Docbook's reference entries are rather generic, it may be
+helpful to specify the pattern to be followed there in more detail.
+Each refentry element should contain:
refdescriptor containing either "(element)" or "(attribute)" or
refname with the element type name, attribute name, or pattern
refpurpose containing (a) an unabbreviated form of the element
+type name, (b) a colon, and (c) a short description (typically one line)
+of the meaning or use of the construct
+synopsis role="rng-raw"
info containing the 'raw' Relax NG declaration for the
For elements, the rng:element element.
For attributes, the rng:attribute element.
For patterns, the rng:define element.
+For comparison: this is similar to inclusion of an element
+declaration from a DTD with parameter entity references
synopsis role="rng-simplified" (for elements and attributes only)
+containing the corresponding declaration from the 'simplified'
+for the construct.
+For comparison: this is similar to inclusion of an element
+declaration from a DTD with all parameter entity references
(optional, for elements) synopsis role="structured" containing a
+structured description in English of the content model of the
+element. Not required, because in the usual case an English
+summary can be generated from the simplified RNG without trouble.
+Not forbidden, because it may be better to do this upstream rather
+than in the creation of the HTML delivery form.
(optional, for elements) refsection entitled "Contents" containing
+a prose description in English of the allowed contents of the
+element. element. Not required, because not always useful.
(for attributes) refsection entitled "Data description"
+with informal prose description of the attribute's datatype.
(for elements) refsection entitled "Attributes" listing all
+attributes defined for the element. If the attribute is used on
+more than one element, then we want just the attribute name with a
+hyperlink to the reference entry for the attribute; if the attribute
+is used only on this element, or should be given custom
+documentation for this parent, a version of the documentation
+pattern for attributes (perhaps attenuated) should be given.
(optional) refsection entitled "Remarks" with prose describing
+relevant information – whatever the user will need to know. For
+elements and attributes this includes recognition criteria,
+distinctions from similar elements or attributes, usage.
refsection entitled "Examples" with prose and examples. In
+some cases, this may just consist of references to examples
+given in other reference entries.
<refentry xml:id="element.assert-xml">
+ <refnamediv>
+ <refdescriptor>(element)</refdescriptor>
+ <refname>assert-xml</refname>
+ <refpurpose>Assert-xml: asserts that the expected
+ output of a conforming ixml processor will be (or,
+ in cases of ambiguity, may be) the child element
+ of the /assert-xml/ element.</refpurpose>
+ </refnamediv>
+ <refsynopsisdiv>
+ <synopsis role="rng-raw">
+ <info>
+ <element xmlns="http://relaxng.org/ns/structure/1.0"
+ xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
+ name="assert-xml">
+ <ref name="external-atts"/>
+ <oneOrMore>
+ <ref name="any-element"/>
+ </oneOrMore>
+ </element>
+ </info>
+ </synopsis>
+ <synopsis role="rng-simplified">
+ <info>
+ <element xmlns="http://relaxng.org/ns/structure/1.0"
+ xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
+ name="assert-xml">
+ <group>
+ <zeroOrMore>
+ <attribute>
+ <anyName>
+ <except>
+ <nsName ns=""/>
+ </except>
+ </anyName>
+ <text/>
+ </attribute>
+ </zeroOrMore>
+ <oneOrMore>
+ <ref name="_1"/>
+ </oneOrMore>
+ </group>
+ </element>
+ </info>
+ </synopsis>
+ </refsynopsisdiv>
+ <refsection>
+ <title>Contents</title>
+ <para>Any well-formed XML</para>
+ </refsection>
+ <refsection>
+ <title>Remarks</title>
+ <para>If the test catalog has a default namespace
+ declaration, it will be necessary to undeclare it in order
+ to avoid namespace capture of the asserted result. (IXML
+ output has no identified namespace.)</para>
+ <para>When comparing output of an ixml processor to the
+ asserted result, namespace declarations are to be
+ ignored.</para>
+ </refsection>
3. Auto-generation of TSD (auto-tsd.xsl)
+[To be drafted.]
4. Merger of TSDs (tsd-merger.xsl)
+[To be drafted.]
5. HTML translation and display (tsd-to-html.xsl and tsd.css)
+[To be drafted.]
6. Related work
+There has been reference documentation for SGML and XML tag sets for
+about as long as there have been SGML and XML tag sets intended for
+serious use, but there has been very little standardization on the
+form of such documentation. Among the examples which have influenced
+this work are:
+Formex (1985). Formex served "formalized exchange of electronic
+publications". The manual includes expository prose reference
+documentation includes two lists of data elements, one for a format
+called CCF (common communications format, an implementation of
+ISO 2079) and one for an SGML document type definition. The
+reference page for each element type includes:
A symbolic representation of the element's generic identifier
+(element type name) and attributes e.g. <AB LA = ...> for
+the AB (abstract) element with its LA (language) attribute.
A definition (or terse prose description) of the element.
A data description of the element, describing its content and
A usage note specifying whether the element is mandatory or
+optional, repeatable or non-repeatable.
A grouping note specifying what 'groups' the element is part
+of (this appears to be a list of possible parents or possibly
+higher-level containers).
An example (not always present).
+For each attribute, the reference page gives:
A definition (or terse prose description) of the attribute.
A data description defining the set of possible attribute
+Not listed here but prominent on each page are cross references to
+the corresponding CCF data elements.
Maler and El Andaloussi (1996). Maler and El Andaloussi recommend
+the following as the "minimal information" for element reference
Short name or actual generic identifier (e.g. olist).
Full name: descriptive phrase that explains the short name
+(e.g. "An ordered list of related items.").
Synopsis: rules for using the element, perhaps including tree
+diagrams showing possible parents and children.
Description: purpose, how and where it should be used,
+recognition criteria, etc.
Attributes: reference description for each attribute.
Contents and Contexts (if not already present in the description
+and if not clearly conveyed by the synopsis).
Processing notes, including notes on how to work around
+shortcomings in current tools.
+JATS documentation (current). This is a reasonably typical example
+of the tag-set documentation supplied by at least some commercially
+active SGML and XML consultants. Reference material for an element
Generic identifier / element type name
Full name
Annotation (to specify which DTDs in a set contain the element)
Related elements
Content model
Content description (prose)
Presentation information (expected styling)
Examples (with prose commentary)
Related resources (pointers to other relevant information)
Source (if adapted from some other tag set)
Module (in a multi-module vocabulary)
Revision history
+There are similar structures for attributes and parameter entities.
+TEI P3 (1994). The auxiliary document type for 'tag set
+documentation' allows for each element type:
generic identifier
full name
short description (typically a one-liner)
list of attributes (with reference information for each)
examples (with commentary and explanation)
information on the part of TEI where the element is defined,
+the classes it belongs to, and the file(s) it is defined in
a data description (in prose)
a list of parents
a list of children
the text of the element's declaration
the text of the element's attribute-list declaration
hyperlinks to relevant documentation
a list of equivalent elements (in other vocabularies)
+TEI P5 (current) has modified the tag-set documentation of P3 and
+made many of the elements less specific.
+The reference material of Docbook has also had an obvious influence.
7. References
Formex: formalized exchange of electronic publications,
+ed. C. Guittet. Luxembourg: Office for Official Publications of the
+European Communities, 'New Technologies – Project Management'
+Department, 1985. 243 pp.
Maler, Eve, and Jeanne El Andaloussi. Developing SGML DTDs: from
+text to model to markup. Upper Saddle River, NJ: Prentice Hall PTR,
diff --git a/tools/tsd/tsd-planning.org b/tools/tsd/tsd-planning.org
new file mode 100644
index 00000000..5c5391ed
--- /dev/null
+++ b/tools/tsd/tsd-planning.org
@@ -0,0 +1,332 @@
+#+title: Tag set documentation project
+#+author: CMSMcQ
+#+date: 28 May 2024
+This document outlines a plan for a workflow to create and maintain
+documentation for the XML vocabulary used for the XML form of ixml
+grammars (here called VXML), and the XML vocabulary used for test
+catalogs by the ixml Community Group.
+In its current form this document is not complete and /is binding on
+no one/. It is written to serve as a basis for discussion, and to
+record some thoughts and expectations.
+* Project overview
+** Primary deliverables
+The central deliverables are reference tag-set documentation (TSD) for
+the XML vocabularies in question.
+The tag-set documentation we wish to create consists of some
+expository prose and a reference pages for the element types and
+widely used attributes.
+The crucial delivery format is XHTML; other XML vocabularies may be
+used for maintenance, but is not expected to be of interest to others.
+** Requirements
+Known requirements and desiderata:
+- It must be possible to update the documentation more or less
+ conveniently as the schemas change.
+- When the schema changes, human-supplied prose must be carried
+ forward easily.
+- Information derivable from the schema should be provided
+ automatically. Specifically: declarations, lists of parents,
+ lists of children, lists of attributes.
+- When schema-derived information changes, it is desirable that
+ the user be warned, so that any relevant prose can also be
+ updated.
+** Workflow
+The intended workflow is described in this diagram:
+#+CAPTION: Workflow plan
+#+ATTR_HTML: width: 25%
+That is:
+- The RNC/RNG schemas are maintained independently.
+ The test catalog schema is maintained by hand in RNC; the ixml
+ schema is generated automatically in RNG from the ixml grammar,
+ which is maintained by hand. We use trang to make an RNG form of
+ the test catalog schema, and an RNC form of the ixml schema.
+ Not shown: we use /jing -s/ to create a 'simplified' version of the
+ RNG schema. In some cases, this may require some hand work. (Jing
+ aborts with an error message if asked to simplify some schemas with
+ recursive patterns. The simplified schema also uses some rather
+ opaque names for patterns introduced by Jing.))
+- An XSLT stylesheet (/rng-to-TSD.xsl/) auto-generates tag-set
+ documentation for the schema.
+ If names and short descriptions are provided in the RNG annotation
+ namespace (/a:documentation/ elements), they should be carried over.
+ Otherwise, dummies should be provided.
+ This stylesheet draws information about the vocabulary from both the
+ RNG schema and the simplified RNG schema.
+- An XSLT stylesheet (/tsd-merger.xsl/) reads the auto-generated
+ documentation and the previous hand-edited version of the same
+ documentation, and produces merged output.
+ + For schema-derived information, the auto-generated documentation
+ is preferred; for other information (basically: the prose),
+ the hand-edited documentation is preferred.
+ + If any schema-derived information differs between the two
+ sources, the stylesheet should report the fact to the user.
+- An XSLT stylesheet (/tsd-to-html.xsl/) reads the merged tag-set
+ documentation and generates HTML with an appropriate stylesheet
+ (tsd.css).
+** Secondary deliverables
+We have some secondary deliverables, whose purpose is to occupy
+the corresponding positions in the workflow.
+- Specification of the tag-set documentation vocabulary and
+ conventions to be used. Obvious candidates are Docbook, TEI P3, TEI
+ P5, and an ad hoc custom vocabulary.
+- /auto-tsd.xsl/
+- /tsd-merger.xsl/
+- /tsd-to-html.xsl/
+- /tsd.css/
+* Vocabulary for tag-set documentation
+We will use Docbook for the XML form of the tag-set documentation.
+Because Docbook's reference entries are rather generic, it may be
+helpful to specify the pattern to be followed there in more detail.
+Each /refentry/ element should contain:
+- /refnamediv/
+ + /refdescriptor/ containing either "(element)" or "(attribute)" or
+ "(pattern)"
+ + /refname/ with the element type name, attribute name, or pattern
+ name
+ + /refpurpose/ containing (a) an unabbreviated form of the element
+ type name, (b) a colon, and (c) a short description (typically one line)
+ of the meaning or use of the construct
+- /refsynopsisdiv/
+ + /synopsis role="rng-raw"/
+ - /info/ containing the 'raw' Relax NG declaration for the
+ construct:
+ + For elements, the /rng:element/ element.
+ + For attributes, the /rng:attribute/ element.
+ + For patterns, the /rng:define/ element.
+ For comparison: this is similar to inclusion of an element
+ declaration from a DTD with parameter entity references
+ unexpanded.
+ + /synopsis role="rng-simplified"/ (for elements and attributes only)
+ containing the corresponding declaration from the 'simplified'
+ for the construct.
+ For comparison: this is similar to inclusion of an element
+ declaration from a DTD with all parameter entity references
+ expanded.
+ + (optional, for elements) /synopsis role="structured"/ containing a
+ structured description in English of the content model of the
+ element. Not required, because in the usual case an English
+ summary can be generated from the simplified RNG without trouble.
+ Not forbidden, because it may be better to do this upstream rather
+ than in the creation of the HTML delivery form.
+- (optional, for elements) /refsection/ entitled "Contents" containing
+ a prose description in English of the allowed contents of the
+ element. element. Not required, because not always useful.
+- (for attributes) /refsection/ entitled "Data description"
+ with informal prose description of the attribute's datatype.
+- (for elements) /refsection/ entitled "Attributes" listing all
+ attributes defined for the element. If the attribute is used on
+ more than one element, then we want just the attribute name with a
+ hyperlink to the reference entry for the attribute; if the attribute
+ is used only on this element, or should be given custom
+ documentation for this parent, a version of the documentation
+ pattern for attributes (perhaps attenuated) should be given.
+- (optional) /refsection/ entitled "Remarks" with prose describing
+ relevant information -- whatever the user will need to know. For
+ elements and attributes this includes recognition criteria,
+ distinctions from similar elements or attributes, usage.
+- /refsection/ entitled "Examples" with prose and examples. In
+ some cases, this may just consist of references to examples
+ given in other reference entries.
+- (optional) /refsection/ entitled "Processing expectations".
+For example:
+#+begin_src Docbook-xml
+ (element)
+ assert-xml
+ Assert-xml: asserts that the expected
+ output of a conforming ixml processor will be (or,
+ in cases of ambiguity, may be) the child element
+ of the /assert-xml/ element.
+ Contents
+ Any well-formed XML
+ Remarks
+ If the test catalog has a default namespace
+ declaration, it will be necessary to undeclare it in order
+ to avoid namespace capture of the asserted result. (IXML
+ output has no identified namespace.)
+ When comparing output of an ixml processor to the
+ asserted result, namespace declarations are to be
+ ignored.
+* Auto-generation of TSD (/auto-tsd.xsl/)
+[To be drafted.]
+* Merger of TSDs (/tsd-merger.xsl/)
+[To be drafted.]
+* HTML translation and display (/tsd-to-html.xsl/ and /tsd.css/)
+[To be drafted.]
+* Related work
+There has been reference documentation for SGML and XML tag sets for
+about as long as there have been SGML and XML tag sets intended for
+serious use, but there has been very little standardization on the
+form of such documentation. Among the examples which have influenced
+this work are:
+- Formex (1985). Formex served "formalized exchange of electronic
+ publications". The manual includes expository prose reference
+ documentation includes two lists of data elements, one for a format
+ called CCF (common communications format, an implementation of
+ ISO 2079) and one for an SGML document type definition. The
+ reference page for each element type includes:
+ + A symbolic representation of the element's generic identifier
+ (element type name) and attributes e.g. ~~ for
+ the AB (abstract) element with its LA (language) attribute.
+ + A definition (or terse prose description) of the element.
+ + A data description of the element, describing its content and
+ format.
+ + A usage note specifying whether the element is mandatory or
+ optional, repeatable or non-repeatable.
+ + A grouping note specifying what 'groups' the element is part
+ of (this appears to be a list of possible parents or possibly
+ higher-level containers).
+ + An example (not always present).
+ For each attribute, the reference page gives:
+ + A definition (or terse prose description) of the attribute.
+ + A data description defining the set of possible attribute
+ values.
+ Not listed here but prominent on each page are cross references to
+ the corresponding CCF data elements.
+- Maler and El Andaloussi (1996). Maler and El Andaloussi recommend
+ the following as the "minimal information" for element reference
+ documentation:
+ + Short name or actual generic identifier (e.g. ~olist~).
+ + Full name: descriptive phrase that explains the short name
+ (e.g. "An ordered list of related items.").
+ + Synopsis: rules for using the element, perhaps including tree
+ diagrams showing possible parents and children.
+ + Description: purpose, how and where it should be used,
+ recognition criteria, etc.
+ + Attributes: reference description for each attribute.
+ + Contents and Contexts (if not already present in the description
+ and if not clearly conveyed by the synopsis).
+ + Examples.
+ + Processing notes, including notes on how to work around
+ shortcomings in current tools.
+- JATS documentation (current). This is a reasonably typical example
+ of the tag-set documentation supplied by at least some commercially
+ active SGML and XML consultants. Reference material for an element
+ includes:
+ + Generic identifier / element type name
+ + Full name
+ + Annotation (to specify which DTDs in a set contain the element)
+ + Definition
+ + Remarks
+ + Related elements
+ + Content model
+ + Content description (prose)
+ + Presentation information (expected styling)
+ + Examples (with prose commentary)
+ + Related resources (pointers to other relevant information)
+ + Source (if adapted from some other tag set)
+ + Module (in a multi-module vocabulary)
+ + Revision history
+ There are similar structures for attributes and parameter entities.
+- TEI P3 (1994). The auxiliary document type for 'tag set
+ documentation' allows for each element type:
+ + generic identifier
+ + full name
+ + short description (typically a one-liner)
+ + list of attributes (with reference information for each)
+ + examples (with commentary and explanation)
+ + remarks
+ + information on the part of TEI where the element is defined,
+ the classes it belongs to, and the file(s) it is defined in
+ + a data description (in prose)
+ + a list of parents
+ + a list of children
+ + the text of the element's declaration
+ + the text of the element's attribute-list declaration
+ + hyperlinks to relevant documentation
+ + a list of equivalent elements (in other vocabularies)
+ TEI P5 (current) has modified the tag-set documentation of P3 and
+ made many of the elements less specific.
+The reference material of Docbook has also had an obvious influence.
+* References
+- /Formex: formalized exchange of electronic publications/,
+ ed. C. Guittet. Luxembourg: Office for Official Publications of the
+ European Communities, 'New Technologies -- Project Management'
+ Department, 1985. 243 pp.
+- Maler, Eve, and Jeanne El Andaloussi. /Developing SGML DTDs: from
+ text to model to markup./ Upper Saddle River, NJ: Prentice Hall PTR,
+ 1996.