diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 3606f8b..4e72bf6 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -6,39 +6,44 @@ While the default context is very flexible in the way each metadata can be repre Related Repository: [Readium Web Publication Manifest](https://github.com/readium/webpub-manifest) -## Title +## Localized Strings + +In many cases, the default context supports alternate representations of the same string in different scripts and languages by means of JSON-LD language maps. +To fill such a map from an EPUB metadata element, proceed as follows: + +* Determine the language used in the content of the carrying element as defined in [the XML specification](https://www.w3.org/TR/xml/#sec-lang-tag), + i.e. check whether the carrying element has or inherits an `xml:lang` attribute. +* In the EPUB 3.x case, check if the element is refined by some `meta` elements that have or inherit an `xml:lang` attribute and whose property is `alternate-script`. + For each one, add to the map the corresponding language associated with the content of the `meta` element. +* When no language hint is available, use `null` or `und` depending on the platform. + +## Sorting keys -The `title` of a publication is an object where each key is a BCP 47 language tag and each value of this key is a string. +Localized sorting keys are supported in RWPM for publication title, contributor/collection' names and subject' names. While computing the localized string, use the language of the carrying element as defined in [the XML specification](https://www.w3.org/TR/xml/#sec-lang-tag) and fallback to `null` or `und`. + + +## Title -In addition to `title`, a publication may also contain a `sortAs` string, used to sort the title as well. +The `title` and `sortAs` keys of a publication are objects where each key is a BCP 47 language tag and each value of this key is a string. When parsing an EPUB, we need to establish: * which title is the primary one -* the language(s) used to express the primary title along with the associated strings -* the string used to sort the title of the publication -* the subtitle of the publication -* the default language for metadata +* a language map of the representations of the title +* a language map of strings used to sort the title of the publication +* which title is the subtitle +* a language map of the representations of the subtitle ### EPUB 2.x The first `` element should be considered the primary one. -To determine the language of the `title` element, check: +Parse it as a [localized string](#localized-strings) to compute a language map. -1. if it has an `xml:lang` attribute; -2. if it shares an `xml:lang` attribute (i.e. it is present on the `package` element); -3. the primary language of the publication. - -The string for `sortAs` is the value of `content` in a `meta` whose `name` is `calibre:title_sort` and `content` is the value to use. +The value of sorting key of the publication is given by the `content` attribute in a `meta` whose `name` is `calibre:title_sort`. The subtitle can’t be expressed. -To determine the default language for metadata, check: - -1. if the `package` has an `xml:lang` attribute; -2. the primary language of the publication. - ### EPUB 3.x The primary `title` is defined using the following logic: @@ -46,20 +51,12 @@ The primary `title` is defined using the following logic: 1. it is the `` element whose `title-type` (refine) is `main`; 2. if there is no such refine, it is the first `` element. -To determine the language of the `title` element, check - -1. if it has an `xml:lang` attribute; -2. if it shares an `xml:lang` attribute (i.e. it is present on the `package` element); -3. the primary language of the publication. - -The string used to sort the `title` of the publication is the value of the main title’s refine whose `property` is `file-as`. - -The subtitle of the publication is the value of the `` element whose `title-type` (refine) is `subtitle`. In case there are several, check their `display-seq` (refine). +Parse it as a [localized string](#localized-strings) to compute a language map. -To determine the default language for metadata, check: +The sorting key of the publication is carried by the main title’s refine whose `property` is `file-as`. If there is none, fallback to the EPUB 2.x case. -1. if the `package` has an `xml:lang` attribute; -2. the primary language of the publication. +The subtitle is the value of the `` element whose `title-type` (refine) is `subtitle`. In case there are several, use the one with the lowest `display-seq` (refine). +Parse it as a [localized string](#localized-strings) to compute a language map. ## Identifier @@ -104,61 +101,64 @@ The valid URI is the result of this second step e.g. `urn:isbn:123456789X`. The contributor’s key depend on the role of the creator or contributor. It is an object that contains a `name`, a `sortAs` and an `identifier` key. -The `name` of each `contributor` is an object where each key is a BCP 47 language tag and each value of the key is a string. +The `name` and `sortAs` keys of each `contributor` are objects where each key is a BCP 47 language tag and each value of the key is a string. -The contributor object may also contain a `sortAs` string, used to sort the contributor as well, and an `identifier` string that must be a valid URI. +The contributor object may also contain an `identifier` string that must be a valid URI. When parsing an EPUB, we need to establish: * the key of the contributor; -* the name of this contributor; -* the alternate forms for this name; -* the string used to sort the name of the contributor. +* a language map for the name of this contributor; +* a language map used to sort the name of the contributor. ### EPUB 2.x The following mapping should be used to determine the key of the contributor’s object: -| element | opf:role | key | -|----------------|------------------------|-------------| -| dc:creator | aut | author | -| dc:contributor | trl | translator | -| dc:contributor | est | editor | -| dc:contributor | ill | illustrator | -| dc:contributor | art | artist | -| dc:contributor | clr | colorist | -| dc:contributor | nrt | narrator | -| dc:contributor | \ or \ | contributor | +| element | opf:role | key | +|------------------------------|--------------------------|-------------| +| dc:creator | \ or \ | author | +| dc:contributor | \ or \ | contributor | +| dc:creator or dc:contributor | aut | author | +| dc:creator or dc:contributor | pbl | publisher | +| dc:creator or dc:contributor | trl | translator | +| dc:creator or dc:contributor | edt | editor | +| dc:creator or dc:contributor | ill | illustrator | +| dc:creator or dc:contributor | art | artist | +| dc:creator or dc:contributor | clr | colorist | +| dc:creator or dc:contributor | nrt | narrator | +| dc:publisher | N/A | publisher | Where `opf:role` is the value of the attribute of the ``. -The `name` of the contributor is the value of the element. +Parse the carrying element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. -Finally, the string used to sort the name of the contributor is the value of the `opf:file-as` attribute of this element. +Finally, the string used to sort the name of the contributor is provided by the value of the `opf:file-as` attribute of this element. ### EPUB 3.x -The following mapping should be used to determine to key of the contributor’s object: - -| element | role | key | -|----------------|------------------------|-------------| -| dc:creator | aut | author | -| dc:contributor | trl | translator | -| dc:contributor | est | editor | -| dc:contributor | ill | illustrator | -| dc:contributor | art | artist | -| dc:contributor | clr | colorist | -| dc:contributor | nrt | narrator | -| dc:contributor | \ or \ | contributor | +The following mapping should be used to determine to key of the contributor’s object: + +| element | role | key | +|------------------------------|--------------------------|-------------| +| dc:creator | \ or \ | author | +| dc:contributor | \ or \ | contributor | +| dc:creator or dc:contributor | aut | author | +| dc:creator or dc:contributor | pbl | publisher | +| dc:creator or dc:contributor | trl | translator | +| dc:creator or dc:contributor | edt | editor | +| dc:creator or dc:contributor | ill | illustrator | +| dc:creator or dc:contributor | art | artist | +| dc:creator or dc:contributor | clr | colorist | +| dc:creator or dc:contributor | nrt | narrator | +| dc:publisher | N/A | publisher | +| media:narrator | N/A | narrator | Where `role` is the value of the refine whose `scheme` is a value of `marc:relators`. -To handle the `name` of the contributor: - -1. check if there is a refine whose propery is `alternate-script` and its corresponding `xml:lang` value; -2. if there is none, use the value of the ``. +Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. -Finally, the string used to sort the name of the contributor is the value of a refine with a `file-as` property. +Finally, the string used to sort the name of the contributor is carried by the contributor's refine whose property is `file-as`. ## Language @@ -187,12 +187,6 @@ The `description` of a publication is a key whose value is a string in plain tex The string is the value of the `` element. -## Publisher - -The `publisher` of a publication is a key whose value is a string. - -The string is the value of the `` element. - ## Publication Date The `published` date of a publication is a key whose value is a string conforming to ISO 8601. @@ -219,16 +213,24 @@ The string is the value of the `meta` element whose `property` attribute has the ## Subjects -The `subject` of a publication is a key whose value is string or an array. +The `subject` of a publication is a key whose value is, in the most complex form, an array of `subject` objects. Although each subject should have its own `` element, this is not necessarily the case in practice, authors and authoring tools often separating multiple subjects using commas or semicolons in the same element. +So, if there is a single `dc:subject` that is not refined by any property, split its content at every comma and semicolon and consider you have several `dc:subject` with shared attributes. + +Parse each `` element as a [localized string](#localized-strings) to compute a language map for the subject’s `name`. -To retrive the value of the `subject` key: +### EPUB 2.x -1. if there is a one single `` element, make sure keywords are not separated using commas or semicolons; - 1. if it doesn’t, the string is the value; - 2. if it does, split the string to build an array; -2. if there are more than one `` elements, build an array using their values. +`sortAs`, `code` and `scheme` cannot be expressed. + +### EPUB 3.x + +The `sortAs` string used to sort the subject is the value of the refine whose `property` has the value of `file-as`. + +The `code` property has the same value as the refine whose `property` has the value of `term`. + +The `scheme` property has the same value as the refine whose `property` has the value of `authority`. ## Collections and Series @@ -269,6 +271,8 @@ The `identifier` string is the value of the refine whose `property` has the valu The `position` of the publication is the value of the refine whose `property` has the value of `group-position`. +If there is no `series`, try to parse `calibre:series` as in the EPUB 2.x case. + ## Progression Direction The `readingProgression` of a publication is a key whose value is a string amongst the following: @@ -503,4 +507,4 @@ For each spine item, the value of `page` must be inferred from the `properties` |-------------------------------|---------| | rendition:page-spread-center | center | | rendition:page-spread-left | left | -| rendition:page-spread-right | right | \ No newline at end of file +| rendition:page-spread-right | right |