-
Notifications
You must be signed in to change notification settings - Fork 60
Internationalization self test for the EPUB 3.3. spec
The particularity of the EPUB is its structure (see also the overview diagram). At first approximation an EPUB instance is a packaged Web site. The real content are in XHTML + CSS, SVG + CSS, possibly MathML, images, etc. The EPUB specification does not redefine these content specific formats, just refers to it. This also means that most of the internationalization features on, say, typography, search on text, writing directions, localization of items like names or dates, etc. depend on the i18n features of those formats, and EPUB takes these granted. Because those formats are the subjects of W3C specifications with a rigorous i18n review, it is not necessary to perform those i18n reviews for the EPUB specification proper.
EPUB does add additional information and structure to the collection of content files. These are:
- A set of XML files on the physical packaging format, called the Open Container Format.
- A navigation document that is used by a reading system to display the table of contents. The navigation file is defined to be in XHTML format, using standard markup.
- A package document, essentially a set of metadata items that governs the behaviors of “Reading Systems”, i.e., the piece of software and hardware that presents the EPUB content to the end user. This document is defined in XML.
The Open Container Format has no user facing and/or textual content, it is therefore irrelevant as far as i18n is concerned. Also, because the navigation document is in XHTML, the aforementioned comment applies to it, too: its internationalization and localization features are dependent on the i18n features of XHTML + CSS.
The package document, however, does include textual information that, directly or indirectly, does influence the behavior of reading systems and needs an i18n review. In other words this (self) review is done on the i18n features of the package document.
A further fact on the package document that is important for this review: the “textual” elements, i.e., those XML elements within the package document that contain natural text (title, creator, subject, accessibility summary, etc.) are not specified directly by the EPUB specification either. Those elements are all either Dublin Core metadata terms and elements, controlled by Dublin Core™ Metadata Initiative (DCMI), or schema.org elements, controlled by the schema.org process. The EPUB specification “just” uses them. The “native” elements in the specifications, i.e., XML elements defined in a namespace that is controlled by this Specification, are all “structural” elements, e.g., links, so-called spine items, etc., that do not contain textual content.
The EPUB 3.3 specification consists of three Recommendations in preparation:
- EPUB 3.3 specifies the content structure of an EPUB 3.3 document. This is the core specification for the authors of an EPUB 3.3 publication, and specifies the features described above.
- EPUB 3.3 Reading Systems specifies the conformance requirements for EPUB 3.3 reading systems, which comprise stand alone reading applications, software embedded in a reading device, but also the behavior of a browser extension.
- EPUB Accessibility 1.1 specifies the content conformance requirements for verifying the accessibility of EPUB publications.
Note that the EPUB 3.3 Reading System says, as part of the conformance requirements:
It MUST honor all presentation logic expressed through the Package Document [EPUB-33] (e.g., the reading order, fallback chains, page progression direction and fixed layouts).
As for the EPUB Accessibility 1.1 document, it concentrates on accessibility requirements for EPUB Publications; in some sense, its relation to the other specification is a bit like the relationship of the WCAG Recommendations to HTML.
As a consequence, each check refer, primarily, to the the EPUB 3.3 specification itself; unless otherwise stated, the quote above covers the EPUB 3.3 Reading Systems and the EPUB Accessibility 1.1 documents’ checks as well.
Using the short i18n review checklist the following items are relevant for the EPUB specification:
- If the spec (or its implementation) contains any natural language text that will be read by a human (this includes error messages or other UI text, JSON strings, etc, etc).
- If the spec (or its implementation) deals with time in any way that will be read by humans and/or crosses time zone boundaries.
- If the spec (or its implementation) defines markup.
- If the spec (or its implementation) deals with names, addresses, time & date formats, etc.
- If the spec (or its implementation) describes a format or data that is likely to need localization.
- If the spec (or its implementation) makes any reference to or relies on any cultural norms.
yielding the following detailed checklist items below.
This checklist for extracted from the i18n self-review checklist, using the outcome of the short checklist above.
-
It should be possible to associate a language with any piece of natural language text that will be read by a user. more
EPUB 3.3 Check: There are two settings:
- The XML
xml:lang
attribute is used for the package document and its enclosed (XML) elements, indicating the language for the metadata items (title, publishers, etc.). See section on shared attributes in the spec. For those, thexml:lang
specification applies. - The separate
dc:language
element specifies the language of the publication, which may control the search and categorization features, but also the user interface provided by the Reading System. This value is not inherited by the content documents that must set the language locally (according to the HTML5 or SVG rules).
- The XML
-
Where possible, there should be a way to label natural language changes in inline text. more
EPUB 3.3 Check: The
xml:lang
attribute is applicable to all metadata elements that have a textual content. -
Consider whether it is useful to express the intended linguistic audience of a resource, in addition to specifying the language used for text processing. more
EPUB 3.3 Check: The package document includes the
dc:language
element: “specifies the language of the content of the EPUB Publication” (as opposed to the language for the metadata entries). This element is REQUIRED in the package document. Note that the package document may contain severaldc:language
elements; this is used, e.g., for multi-language publications. -
A language declaration that indicates the text processing language for a range of text must associate a single language value with a specific range of text. more
EPUB 3.3 Check: this is covered by the
xml:lang
attribute (a "range" of text being a single metadata element in this context). -
Use the HTML
lang
and XMLxml:lang
language attributes where appropriate to identify the text processing language, rather than creating a new attribute or mechanism. moreEPUB 3.3 Check:
xml:lang
is used when appropriate. -
It should be possible to associate a metadata-type language declaration (which indicates the intended use of the resource rather than the language of a specific range of text) with multiple language values. more
EPUB 3.3 Check: The package document may contain several
dc:language
elements; this may be used for multi-language publications (with the firstlanguage
element considered to be the “primary” language). -
Attributes that express the language of external resources should not use the HTML
lang
and XMLxml:lang
language attributes, but should use a different attribute when they represent metadata (which indicates the intended use of the resource rather than the language of a specific range of text). moreEPUB 3.3 Check: See specification of the
link
element which introduces thehreflang
attribute when linking from the package document.
-
Values for language declarations must use BCP 47. more
EPUB 3.3 Check: The value of
xml:lang
is defined by the relevant section of the XML specification (referring to BCP47). The value of thedc:language
element is defined to be BCP47 by DCMI. -
Refer to BCP 47, not to RFC 5646. more
EPUB 3.3 Check: BCP47 is used.
-
Be specific about what level of conformance you expect for language tags: BCP 47 defines two levels of conformance, "valid" and "well-formed".
EPUB 3.3 Check:
dc:language
is specified to be well-formed per BCP47 by DCMI; this is reinforced in the EPUB specification. The same is done forxml:lang
-
Specifications may require implementations to check if language tags are "valid", but in most circumstances should only require that the language tags be "well-formed".
EPUB 3.3 Check:
dc:language
is specified to be well-formed per BCP47 by DCMI; this is reinforced in the EPUB specification. The same is done forxml:lang
-
Specifications should require content and content authors to use "valid" language tags.
EPUB 3.3 Check (Negative):
dc:language
is specified to be well-formed per BCP47 by DCMI; this is reinforced in the EPUB specification. There is no requirement to use "valid" language. See also issue 1509 that details the reasons (mosly on the role ofepubcheck
). -
Reference BCP47 for language tag matching.
EPUB 3.3 Check: BCP47 is used.
-
The specification should indicate how to define the default text-processing language for the resource as a whole. more
EPUB 3.3 Check: For the metadata entries, the
xml:lang
processing model applies. For the publication as a whole, the package document MUST include a validdc:language
element. -
Content within the resource should inherit the language of the text-processing declared at the resource level, unless it is specifically overridden.
EPUB 3.3 Check: This is the
xml:lang
processing model. -
Consider whether it is necessary to have separate declarations to indicate the text-processing language versus metadata about the expected use of the resource. more
EPUB 3.3 Check: This is what the separation among the usage of the
xml:lang
attribute, thedc:language
tag, and the language setting in the separate content documents. -
If there is only one language declaration for a resource, and it has more than one language tag as a value, it must be possible to identify the default text-processing language for the resource. more
EPUB 3.3 Check: n/a. The
xml:lang
attribute can only take a single value. Fordc:language
, in case several values are used, the first one is considered to be the "main". (See spec text.)
-
By default, blocks of content should inherit any text-processing language set for the resource as a whole. more
EPUB 3.3 Check: n/a for the metadata values, except that the package level value of
xml:lang
can be overwritten if it is explicitly specified on an (XML) element. -
It should be possible to indicate a change in language for blocks of content where the language changes. more
EPUB 3.3 Check:
xml:lang
processing does that.
-
It should be possible to indicate language for spans of inline text where the language changes. more
EPUB 3.3 Check (Negative): The content of the relevant metadata items (title, authors, accessibility summary etc.) are defined as strings by DCMI or schema.org. The content is in UNICODE, which means that bidi should be used, but no internal structure can be defined.
-
It must be possible to indicate base direction for each individual paragraph-level item of natural language text that will be read by someone. more
EPUB 3.3 Check:
- The top level element in the package document, as well as the elements with a text content, can use the
dir
attribute, with possible values ofltr
,rtl
, orauto
. - The
spine
element, that lists the reading order of the content, has the optionalpage-progression-direction
attribute that sets the direction on the publication level (e.g., for the placement of the table of content by the Reading System or any other user interface feature).
- The top level element in the package document, as well as the elements with a text content, can use the
-
It must be possible to indicate base direction changes for embedded runs of inline bidirectional text for all natural language text that will be read by someone. more
EPUB 3.3 Check (Negative): The content of the relevant metadata items (title, authors, etc) are defined as strings by DCMI or schema.org. The content is in UNICODE, which means that bidi should be used, but no internal structure can be defined.
-
Annotating right-to-left text must require the minimum amount of effort for people who work natively with right-to-left scripts. more
EPUB 3.3 Check: n/a. EPUB does not define any annotation behavior.
-
Do not assume that direction can be determined from language information. more
EPUB 3.3 Check: this is covered by the definition of the
dir
attribute
EPUB 3.3. Reading Systems Check: this is covered by thedir
attribute processing
-
Values for the default base direction should include left-to-right, right-to-left, and auto. more
EPUB 3.3 Check: this is covered by the definition of the
dir
attribute
The content of this section is not relevant for EPUB, insofar as the metadata in a package document is only a collection of strings, no markup is defined.
-
The spec should indicate how to define a default base direction for the resource as a whole, ie. set the overall base direction. more
EPUB 3.3 Check: n/a.
-
The default base direction, in the absence of other information, should be LTR. more
EPUB 3.3 Check: n/a.
-
The content author must be able to indicate parts of the text where the base direction changes. At the block level, this should be achieved using attributes or metadata, and should not rely on Unicode control characters.
EPUB 3.3 Check: n/a.
-
It must be possible to also set the direction for content fragments to
auto
. This means that the base direction will be determined by examining the content itself.EPUB 3.3 Check: n/a.
-
If the overall base direction is set to
auto
for plain text, the direction of content paragraphs should be determined on a paragraph by paragraph basis.EPUB 3.3 Check: n/a.
-
To indicate the sides of a block of text where relative to the start and end of its contained lines, you should use 'before' and 'after' (maybe block-start/block-end – the terminology is changing), rather than 'top' and 'bottom'.
EPUB 3.3 Check: n/a.
-
To indicate the start/end of a line you should use 'start' and 'end' rather than 'left' and 'right'.
EPUB 3.3 Check: n/a.
-
Provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control.
EPUB 3.3 Check: n/a.
-
Provide metadata constructs that can be used to indicate the base direction of any natural language string. more
EPUB 3.3 Check: This is the role of the
dir
attribute. -
Specify that consumers of strings should use heuristics, preferably based on the Unicode Standard first-strong algorithm, to detect the base direction of a string except where metadata is provided. more
EPUB 3.3 Check: covered by the
dir
attribute specification
EPUB 3.3 Reading System Check: covered by thedir
attribute behavior -
Where possible, define a field to indicate the default direction for all strings in a given resource or document. more
EPUB 3.3 Check: This is the role of the
dir
attribute. -
Do NOT assume that a creating a document-level default without the ability to change direction for any string is sufficient. more
EPUB 3.3 Check: The
dir
attribute can be set on all elements. -
If metadata is not available due to legacy implementations and cannot otherwise be provided, specifications MAY allow a base direction to be interpolated from available language metadata. more
EPUB 3.3 Check: n/a
-
Specifications MUST NOT require the production or use of paired bidi controls. more
EPUB 3.3 Check: The specification does not go into these details.
There is no mechanism to set inline directionality in the metadata elements beyond what Unicode provides and beyond what can be set for the metadata item as a whole. All relevant elements have been defined by DCMI or schema.org, and this specification cannot change them by adding internal XML or HTML structures. Bidi should be used relying on the UNICODE RLM/LRM marker characters.
-
It must be possible to indicate spans of inline text where the base direction changes. If markup is available, this is the preferred method. Otherwise your specification must require that Unicode control characters are recognized by the receiving application, and correctly implemented.
EPUB 3.3 Check: The reference is to the core BIDI, which covers this.
EPUB 3.3 Reading System Check: The processing behavior ofdir
specifies this. -
It must be possible to also set the direction for a span to auto. This means that the base direction will be determined by examining the content itself. A typical approach here would be to set the direction based on the first strong directional character outside of any markup. more
EPUB 3.3 Check: n/a. There is no extra markup for the metadata items.
-
If users use Unicode bidirectional control characters, the isolating RLI/LRI/FSI with PDI characters must be supported by the application and recommended (rather than RLE/LRE with PDF) by the spec.
EPUB 3.3 Check: The reference is to the core BIDI, which covers this.
EPUB 3.3 Reading System Check: The processing behavior ofdir
specifies this. -
Use of RLM/LRM should be appropriate, and expectations of what those controls can and cannot do should be clear in the spec. more
EPUB 3.3 Check: The reference is to the core BIDI, which covers this.
EPUB 3.3 Reading System Check: The processing behavior ofdir
specifies this. -
For markup, provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control.
EPUB 3.3 Check: This is the role of the
dir
attribute, but only on the full metadata item. -
For markup, allow bidi attributes on all inline elements in markup that contain text.
EPUB 3.3 Check: This is the role of the
dir
attribute, but only on the full metadata item. -
For markup, provide attributes that allow the user to (a) create an embedded base direction or (b) override the bidirectional algorithm altogether; the attribute should allow the user to set the direction to LTR or RTL or the aforementioned Auto in either of these two scenarios.
EPUB 3.3 Check: This is the role of the
dir
attribute, but only on the full metadata item.
-
Since specifications in general need both a definition for their characters and the semantics associated with these characters, specifications SHOULD include a reference to the Unicode Standard, whether or not they include a reference to ISO/IEC 10646. more
EPUB 3.3 Check: the reference is: “The Unicode Standard. Unicode Consortium. URL: https://www.unicode.org/versions/latest/”
-
A generic reference to the Unicode Standard MUST be made if it is desired that characters allocated after a specification is published are usable with that specification. A specific reference to the Unicode Standard MAY be included to ensure that functionality depending on a particular version is available and will not change over time. more
EPUB 3.3 Check: see above.
-
All generic references to the Unicode Standard MUST refer to the latest version of the Unicode Standard available at the date of publication of the containing specification. more
EPUB 3.3 Check: the reference above is the only reference in the spec.
-
All generic references to ISO/IEC 10646 MUST refer to the latest version of ISO/IEC 10646 available at the date of publication of the containing specification. more
EPUB 3.3 Check: the reference above is the only reference in the spec.
-
Do not define attribute values that will contain user readable content. Use elements for such content. more
EPUB 3.3 Check: not such attribute is defined.
-
If you do define attribute values containing user readable content, provide a means to indicate directional and language information for that text separately from the text contained in the element.
EPUB 3.3 Check: n/a
-
Provide a way for authors to annotate arbitrary inline content using a
span
-like element or construct. moreEPUB 3.3 Check: For metadata items that may have translations and/or alternate script representation, the specification provides a way to repeat the content in different languages and scripts using the
refines
mechanism, see example in the spec.
-
Identifiers should be case-sensitive.
EPUB 3.3 Check: the relevant portion of the spec is based on xml, which is case-sensitive.
-
Avoid natural language text in elements that only allow for plain text and in attribute values.
EPUB 3.3 Check: this approach is followed in the spec.
-
Provide a span-like element that can be used for any text content to apply information needed for internationalization. more
EPUB 3.3 Check (Negative): All textual content metadata are defined by DCMI, and EPUB cannot add extra internal structure.
-
When definining data formats, use locale-neutral serialization forms.
EPUB 3.3 Check: n/a.
Check for all: EPUB includes the dc:date
and dcterms:modified
elements (also defined by DCMI). The value of:
-
dc:date
is “RECOMMENDED that the date string conform to [ISO8601], particularly the subset expressed in W3C Date and Time Formats [DateTime], as such strings are both human and machine readable” (see the DCMI specification). -
dcterms:modified
“MUST be an [XMLSCHEMA-2] dateTime conformant date of the form: CCYY-MM-DDThh:mm:ssZ” (see the DCMI specification).
- When defining calendar and date systems, be sure to allow for dates prior to the common era, or at least define handling of dates outside the most common range.
- When defining time or date data types, ensure that the time zone or relationship to UTC is always defined.
- Provide a health warning for conversion of time or date data types that are "floating" to/from incremental types, referring as necessary to the Time Zones WG Note. more
- Allow for leap seconds in date and time data types. more
- Use consistent terminology when discussing date and time values. Use 'floating' time for time zone independent values.
- Keep separate the definition of time zone from time zone offset.
- Use IANA time zone IDs to identify time zones. Do not use offsets or LTO as a proxy for time zone.
- Use a separate field to identify time zone.
- When defining rules for a "week", allow for culturally specific rules to be applied. more
- When defining rules for week number of year, allow for culturally specific rules to be applied.
- When non-Gregorian calendars are permitted, note that the "month" field can go to 13 (undecimber).
These all relate to the dc:creator
and dc:contributor
elements, as defined by DCMI.
-
Check whether you really need to store or access given name and family name separately. more
EPUB 3.3 Check: The specification does not require separate name and family names.
-
Avoid placing limits on the length of names, or if you do, make allowance for long strings. more
EPUB 3.3 Check: There is no limit.
-
Try to avoid using the labels 'first name' and 'last name' in non-localized contexts. more
EPUB 3.3 Check: No such labels are used.
-
Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where users can provide part(s) of their name that you need to use for a specific purpose. more
EPUB 3.3 Check: This is provided by the
refines
mechanism combined with therole
property. -
Allow for users to be asked separately how they would like you be addressed when someone contacts them. more
EPUB 3.3 Check: n/a.
-
If parts of a person's name are captured separately, ensure that the separate items can capture all relevant information. more
EPUB 3.3 Check: n/a.
-
Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
EPUB 3.3 Check: n/a.
-
Don't assume that a single letter name is an initial. more
EPUB 3.3 Check: n/a.
-
Don't require that people supply a family name. more
EPUB 3.3 Check: n/a.
-
Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
EPUB 3.3 Check: n/a.
-
Don't require names to be entered all in upper case. more
EPUB 3.3 Check: n/a.
-
Allow the user to enter a name with spaces. more
EPUB 3.3 Check: n/a.
-
Don't assume that members of the same family will share the same family name. more
EPUB 3.3 Check: n/a.
-
It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
EPUB 3.3 Check: n/a.
-
You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, as separate items. more
EPUB 3.3 Check: n/a.