From d5cf242b47818e442578194416a1edbe60bbc24d Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Mon, 13 Jan 2020 17:33:11 +0100 Subject: [PATCH 01/17] Fix a typo --- streamer/parser/metadata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 3606f8b..214a77e 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -123,7 +123,7 @@ The following mapping should be used to determine the key of the contributor’s |----------------|------------------------|-------------| | dc:creator | aut | author | | dc:contributor | trl | translator | -| dc:contributor | est | editor | +| dc:contributor | edt | editor | | dc:contributor | ill | illustrator | | dc:contributor | art | artist | | dc:contributor | clr | colorist | From c0779fbaf1868cc974098d1b5730a53bf1991b03 Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Fri, 20 Mar 2020 15:34:20 +0100 Subject: [PATCH 02/17] Fix inheritance rules for xml:lang --- streamer/parser/metadata.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 214a77e..443edcb 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -27,7 +27,7 @@ The first `` element should be considered the primary one. To determine the language of the `title` element, check: 1. if it has an `xml:lang` attribute; -2. if it shares an `xml:lang` attribute (i.e. it is present on the `package` element); +2. if it shares an `xml:lang` attribute (i.e. it is present on the `metadata` or `package` element); 3. the primary language of the publication. The string for `sortAs` is the value of `content` in a `meta` whose `name` is `calibre:title_sort` and `content` is the value to use. @@ -49,7 +49,7 @@ The primary `title` is defined using the following logic: To determine the language of the `title` element, check 1. if it has an `xml:lang` attribute; -2. if it shares an `xml:lang` attribute (i.e. it is present on the `package` element); +2. if it shares an `xml:lang` attribute (i.e. it is present on the `metadata` or `package` element); 3. the primary language of the publication. The string used to sort the `title` of the publication is the value of the main title’s refine whose `property` is `file-as`. @@ -503,4 +503,4 @@ For each spine item, the value of `page` must be inferred from the `properties` |-------------------------------|---------| | rendition:page-spread-center | center | | rendition:page-spread-left | left | -| rendition:page-spread-right | right | \ No newline at end of file +| rendition:page-spread-right | right | From 4f748eebf23ca01e72eeff0b4f0a03565bf1fc9b Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Fri, 20 Mar 2020 16:12:23 +0100 Subject: [PATCH 03/17] Clarify how language maps should be computed. --- streamer/parser/metadata.md | 52 +++++++++++++++---------------------- 1 file changed, 21 insertions(+), 31 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 443edcb..4cdf8cf 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -6,6 +6,17 @@ While the default context is very flexible in the way each metadata can be repre Related Repository: [Readium Web Publication Manifest](https://github.com/readium/webpub-manifest) +## Localized Strings + +In many cases, the default context supports alternate representations of the same string in different scripts and languages by means of JSON-LD language maps. +To fill such a map from an EPUB metadata element, proceed as follows: + +* Determine the language used in the content of the carrying element as defined in [the XML specification](https://www.w3.org/TR/xml/#sec-lang-tag), + i.e. check whether the carrying element has or inherits an `xml:lang` attribute. Otherwise, fallback to the primary language of the publication. +* In the EPUB 3.x case, check if the element is refined by some `meta` elements that have or inherit an `xml:lang` attribute and whose property is `alternate-script`. + For each one, add to the map the corresponding language associated with the content of the `meta` element. + + ## Title The `title` of a publication is an object where each key is a BCP 47 language tag and each value of this key is a string. @@ -15,30 +26,21 @@ In addition to `title`, a publication may also contain a `sortAs` string, used t When parsing an EPUB, we need to establish: * which title is the primary one -* the language(s) used to express the primary title along with the associated strings * the string used to sort the title of the publication -* the subtitle of the publication -* the default language for metadata +* a language map of the representations of the title +* which title is the subtitle +* a language map of the representations of the title ### EPUB 2.x The first `` element should be considered the primary one. -To determine the language of the `title` element, check: - -1. if it has an `xml:lang` attribute; -2. if it shares an `xml:lang` attribute (i.e. it is present on the `metadata` or `package` element); -3. the primary language of the publication. +Parse it as a [localized string](#localized-strings) to compute a language map. The string for `sortAs` is the value of `content` in a `meta` whose `name` is `calibre:title_sort` and `content` is the value to use. The subtitle can’t be expressed. -To determine the default language for metadata, check: - -1. if the `package` has an `xml:lang` attribute; -2. the primary language of the publication. - ### EPUB 3.x The primary `title` is defined using the following logic: @@ -46,20 +48,12 @@ The primary `title` is defined using the following logic: 1. it is the `` element whose `title-type` (refine) is `main`; 2. if there is no such refine, it is the first `` element. -To determine the language of the `title` element, check - -1. if it has an `xml:lang` attribute; -2. if it shares an `xml:lang` attribute (i.e. it is present on the `metadata` or `package` element); -3. the primary language of the publication. +Parse it as a [localized string](#localized-strings) to compute a language map. The string used to sort the `title` of the publication is the value of the main title’s refine whose `property` is `file-as`. -The subtitle of the publication is the value of the `` element whose `title-type` (refine) is `subtitle`. In case there are several, check their `display-seq` (refine). - -To determine the default language for metadata, check: - -1. if the `package` has an `xml:lang` attribute; -2. the primary language of the publication. +The subtitle is the `` element whose `title-type` (refine) is `subtitle`. In case there are several, use the one with the lowest `display-seq` (refine). +Parse it as a [localized string](#localized-strings) to compute a language map. ## Identifier @@ -111,8 +105,7 @@ The contributor object may also contain a `sortAs` string, used to sort the cont When parsing an EPUB, we need to establish: * the key of the contributor; -* the name of this contributor; -* the alternate forms for this name; +* a language map for the name of this contributor; * the string used to sort the name of the contributor. ### EPUB 2.x @@ -132,7 +125,7 @@ The following mapping should be used to determine the key of the contributor’s Where `opf:role` is the value of the attribute of the ``. -The `name` of the contributor is the value of the element. +Parse the carrying element as a [localized string](#localized-strings) to compute a language map for his name. Finally, the string used to sort the name of the contributor is the value of the `opf:file-as` attribute of this element. @@ -153,10 +146,7 @@ The following mapping should be used to determine to key of the contributor’s Where `role` is the value of the refine whose `scheme` is a value of `marc:relators`. -To handle the `name` of the contributor: - -1. check if there is a refine whose propery is `alternate-script` and its corresponding `xml:lang` value; -2. if there is none, use the value of the ``. +Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for his name. Finally, the string used to sort the name of the contributor is the value of a refine with a `file-as` property. From 605a51163f68d8383d5a0ad2e8499aabd9a31296 Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Sat, 21 Mar 2020 10:02:19 +0100 Subject: [PATCH 04/17] Precise rules for determining contributor's key --- streamer/parser/metadata.md | 58 +++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 28 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 4cdf8cf..bef4004 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -112,20 +112,23 @@ When parsing an EPUB, we need to establish: The following mapping should be used to determine the key of the contributor’s object: -| element | opf:role | key | -|----------------|------------------------|-------------| -| dc:creator | aut | author | -| dc:contributor | trl | translator | -| dc:contributor | edt | editor | -| dc:contributor | ill | illustrator | -| dc:contributor | art | artist | -| dc:contributor | clr | colorist | -| dc:contributor | nrt | narrator | -| dc:contributor | \ or \ | contributor | +| element | opf:role | key | +|----------------|---------------------------------|-------------| +| dc:creator | aut or \ or \ | author | +| dc:publisher | pbl or \ or \ | publisher | +| dc:contributor | trl | translator | +| dc:contributor | edt | editor | +| dc:contributor | ill | illustrator | +| dc:contributor | art | artist | +| dc:contributor | clr | colorist | +| dc:contributor | nrt | narrator | +| dc:contributor | \ or \ | contributor | Where `opf:role` is the value of the attribute of the ``. -Parse the carrying element as a [localized string](#localized-strings) to compute a language map for his name. +In case of conflict, `opf:role` overrides the XML element used. So, for example, map a `` or `` element with `opf:role` aut to an author. + +Parse the carrying element as a [localized string](#localized-strings) to compute a language map for the contributor's name. Finally, the string used to sort the name of the contributor is the value of the `opf:file-as` attribute of this element. @@ -133,20 +136,25 @@ Finally, the string used to sort the name of the contributor is the value of the The following mapping should be used to determine to key of the contributor’s object: -| element | role | key | -|----------------|------------------------|-------------| -| dc:creator | aut | author | -| dc:contributor | trl | translator | -| dc:contributor | est | editor | -| dc:contributor | ill | illustrator | -| dc:contributor | art | artist | -| dc:contributor | clr | colorist | -| dc:contributor | nrt | narrator | -| dc:contributor | \ or \ | contributor | +| element | role | key | +|----------------|---------------------------------|-------------| +| dc:creator | aut or \ or \ | author | +| dc:publisher | pbl or \ or \ | publisher | +| dc:contributor | trl | translator | +| dc:contributor | edt | editor | +| dc:contributor | ill | illustrator | +| dc:contributor | art | artist | +| dc:contributor | clr | colorist | +| dc:contributor | nrt | narrator | +| media:narrator | nrt or \ or \ | narrator | +| dc:contributor | \ or \ | contributor | + Where `role` is the value of the refine whose `scheme` is a value of `marc:relators`. -Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for his name. +In case of conflict, `role` overrides the XML element used. So, for example, map a `` or `` element with `role` aut to an author. + +Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for the contributor's name. Finally, the string used to sort the name of the contributor is the value of a refine with a `file-as` property. @@ -177,12 +185,6 @@ The `description` of a publication is a key whose value is a string in plain tex The string is the value of the `` element. -## Publisher - -The `publisher` of a publication is a key whose value is a string. - -The string is the value of the `` element. - ## Publication Date The `published` date of a publication is a key whose value is a string conforming to ISO 8601. From 7ef0d76b87c1d399a2d6c4eba2070f7fd629e8d1 Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Sat, 21 Mar 2020 10:54:29 +0100 Subject: [PATCH 05/17] Update the rules for parsing subjects Fix #111. --- streamer/parser/metadata.md | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index bef4004..a752c9a 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -211,16 +211,24 @@ The string is the value of the `meta` element whose `property` attribute has the ## Subjects -The `subject` of a publication is a key whose value is string or an array. +The `subject` of a publication is a key whose value is, in the most complex form, an array of `subject` objects. -Although each subject should have its own `` element, this is not necessarily the case in practice, authors and authoring tools often separating multiple subjects using commas or semicolons in the same element. +Although each subject should have its own `dc:subject` element, this is not necessarily the case in practice, authors and authoring tools often separating multiple subjects using commas or semicolons in the same element. +So, if there is a single `dc:subject` that is not refined by any property, split its content at every comma and semicolon and consider you have several `dc:subject` with shared attributes. -To retrive the value of the `subject` key: +Parse each `dc:subject` element as a [localized string](#localized-strings) to compute a language map for the subject's `name`. -1. if there is a one single `` element, make sure keywords are not separated using commas or semicolons; - 1. if it doesn’t, the string is the value; - 2. if it does, split the string to build an array; -2. if there are more than one `` elements, build an array using their values. +### EPUB 2.x + +`sortAs`, `code` and `scheme` cannot be expressed. + +### EPUB 3.x + +The `sortAs` string used to sort the subject is the value of the refine whose `property` has the value of `file-as`. + +The `code` property has the same value as the refine whose `property` has the value of `term`. + +The `scheme` property has the same value as the refine whose `property` has the value of `authority`. ## Collections and Series From b8093826d1c8f3847d6679730ef52e6904cfaaa3 Mon Sep 17 00:00:00 2001 From: qnga <32197639+qnga@users.noreply.github.com> Date: Sun, 22 Mar 2020 19:47:02 +0100 Subject: [PATCH 06/17] Update streamer/parser/metadata.md Co-Authored-By: Jiminy Panoz --- streamer/parser/metadata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index a752c9a..92081b9 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -52,7 +52,7 @@ Parse it as a [localized string](#localized-strings) to compute a language map. The string used to sort the `title` of the publication is the value of the main title’s refine whose `property` is `file-as`. -The subtitle is the `` element whose `title-type` (refine) is `subtitle`. In case there are several, use the one with the lowest `display-seq` (refine). +The subtitle is the value of the `` element whose `title-type` (refine) is `subtitle`. In case there are several, use the one with the lowest `display-seq` (refine). Parse it as a [localized string](#localized-strings) to compute a language map. ## Identifier From 1f23ad225de03588332eeb26ca60cee01d95a3b9 Mon Sep 17 00:00:00 2001 From: qnga <32197639+qnga@users.noreply.github.com> Date: Sun, 22 Mar 2020 19:47:50 +0100 Subject: [PATCH 07/17] Update streamer/parser/metadata.md Co-Authored-By: Jiminy Panoz --- streamer/parser/metadata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 92081b9..02544a1 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -126,7 +126,7 @@ The following mapping should be used to determine the key of the contributor’s Where `opf:role` is the value of the attribute of the ``. -In case of conflict, `opf:role` overrides the XML element used. So, for example, map a `` or `` element with `opf:role` aut to an author. +In case of conflict, `opf:role` overrides the XML element used. For example, map a `` or `` element whose value for `opf:role` is `aut` to an author. Parse the carrying element as a [localized string](#localized-strings) to compute a language map for the contributor's name. From 60592ca8a5a9e41d9ce5a8320d64aa2135c4eea0 Mon Sep 17 00:00:00 2001 From: qnga <32197639+qnga@users.noreply.github.com> Date: Sun, 22 Mar 2020 19:50:02 +0100 Subject: [PATCH 08/17] Update streamer/parser/metadata.md Co-Authored-By: Jiminy Panoz --- streamer/parser/metadata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 02544a1..f11b1ef 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -152,7 +152,7 @@ The following mapping should be used to determine to key of the contributor’s Where `role` is the value of the refine whose `scheme` is a value of `marc:relators`. -In case of conflict, `role` overrides the XML element used. So, for example, map a `` or `` element with `role` aut to an author. +In case of conflict, `role` overrides the XML element used. So, for example, map a `` or `` element whose value for `role` is `aut` to an author. Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for the contributor's name. From 2d1d56e731d1fea2be0618b7b6c0cf7ebc9c17f9 Mon Sep 17 00:00:00 2001 From: qnga <32197639+qnga@users.noreply.github.com> Date: Sun, 22 Mar 2020 19:53:50 +0100 Subject: [PATCH 09/17] Fix apostrophes Co-Authored-By: Jiminy Panoz --- streamer/parser/metadata.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index f11b1ef..2a976f8 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -128,7 +128,7 @@ Where `opf:role` is the value of the attribute of the ``. In case of conflict, `opf:role` overrides the XML element used. For example, map a `` or `` element whose value for `opf:role` is `aut` to an author. -Parse the carrying element as a [localized string](#localized-strings) to compute a language map for the contributor's name. +Parse the carrying element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. Finally, the string used to sort the name of the contributor is the value of the `opf:file-as` attribute of this element. @@ -154,7 +154,7 @@ Where `role` is the value of the refine whose `scheme` is a value of `marc:relat In case of conflict, `role` overrides the XML element used. So, for example, map a `` or `` element whose value for `role` is `aut` to an author. -Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for the contributor's name. +Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. Finally, the string used to sort the name of the contributor is the value of a refine with a `file-as` property. From 5d7f48744ce7d6f99bb29ec54acdb78a6dc70304 Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Sun, 22 Mar 2020 19:54:32 +0100 Subject: [PATCH 10/17] Fix a typo --- streamer/parser/metadata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 2a976f8..37b8baf 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -29,7 +29,7 @@ When parsing an EPUB, we need to establish: * the string used to sort the title of the publication * a language map of the representations of the title * which title is the subtitle -* a language map of the representations of the title +* a language map of the representations of the subtitle ### EPUB 2.x From 1eeb42803e83ffd248323685ddeebeaf05666537 Mon Sep 17 00:00:00 2001 From: qnga <32197639+qnga@users.noreply.github.com> Date: Sun, 22 Mar 2020 19:56:55 +0100 Subject: [PATCH 11/17] Apply suggestions from code review Co-Authored-By: Jiminy Panoz --- streamer/parser/metadata.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 37b8baf..8e38f17 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -213,10 +213,10 @@ The string is the value of the `meta` element whose `property` attribute has the The `subject` of a publication is a key whose value is, in the most complex form, an array of `subject` objects. -Although each subject should have its own `dc:subject` element, this is not necessarily the case in practice, authors and authoring tools often separating multiple subjects using commas or semicolons in the same element. +Although each subject should have its own `` element, this is not necessarily the case in practice, authors and authoring tools often separating multiple subjects using commas or semicolons in the same element. So, if there is a single `dc:subject` that is not refined by any property, split its content at every comma and semicolon and consider you have several `dc:subject` with shared attributes. -Parse each `dc:subject` element as a [localized string](#localized-strings) to compute a language map for the subject's `name`. +Parse each `` element as a [localized string](#localized-strings) to compute a language map for the subject’s `name`. ### EPUB 2.x From b6503821e00f95a9393d966b9dd1fd6e682770d4 Mon Sep 17 00:00:00 2001 From: qnga <32197639+qnga@users.noreply.github.com> Date: Tue, 24 Mar 2020 10:35:26 +0100 Subject: [PATCH 12/17] Improve the table for contributor parsing Close #126 Co-Authored-By: Jiminy Panoz --- streamer/parser/metadata.md | 51 ++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 8e38f17..a20dfa8 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -112,17 +112,19 @@ When parsing an EPUB, we need to establish: The following mapping should be used to determine the key of the contributor’s object: -| element | opf:role | key | -|----------------|---------------------------------|-------------| -| dc:creator | aut or \ or \ | author | -| dc:publisher | pbl or \ or \ | publisher | -| dc:contributor | trl | translator | -| dc:contributor | edt | editor | -| dc:contributor | ill | illustrator | -| dc:contributor | art | artist | -| dc:contributor | clr | colorist | -| dc:contributor | nrt | narrator | -| dc:contributor | \ or \ | contributor | +| element | opf:role | key | +|------------------------------|--------------------------|-------------| +| dc:creator | \ or \ | author | +| dc:creator or dc:contributor | aut | author | +| dc:contributor | \ or \ | contributor | +| dc:publisher | \ | publisher | +| dc:creator or dc:contributor | pbl | publisher | +| dc:creator or dc:contributor | trl | translator | +| dc:creator or dc:contributor | edt | editor | +| dc:creator or dc:contributor | ill | illustrator | +| dc:creator or dc:contributor | art | artist | +| dc:creator or dc:contributor | clr | colorist | +| dc:creator or dc:contributor | nrt | narrator | Where `opf:role` is the value of the attribute of the ``. @@ -136,19 +138,20 @@ Finally, the string used to sort the name of the contributor is the value of the The following mapping should be used to determine to key of the contributor’s object: -| element | role | key | -|----------------|---------------------------------|-------------| -| dc:creator | aut or \ or \ | author | -| dc:publisher | pbl or \ or \ | publisher | -| dc:contributor | trl | translator | -| dc:contributor | edt | editor | -| dc:contributor | ill | illustrator | -| dc:contributor | art | artist | -| dc:contributor | clr | colorist | -| dc:contributor | nrt | narrator | -| media:narrator | nrt or \ or \ | narrator | -| dc:contributor | \ or \ | contributor | - +| element | role | key | +|------------------------------|--------------------------|-------------| +| dc:creator | \ or \ | author | +| dc:creator or dc:contributor | aut | author | +| dc:contributor | \ or \ | contributor | +| dc:publisher | \ | publisher | +| dc:creator or dc:contributor | pbl | publisher | +| dc:creator or dc:contributor | trl | translator | +| dc:creator or dc:contributor | edt | editor | +| dc:creator or dc:contributor | ill | illustrator | +| dc:creator or dc:contributor | art | artist | +| dc:creator or dc:contributor | clr | colorist | +| dc:creator or dc:contributor | nrt | narrator | +| media:narrator | \ | narrator | Where `role` is the value of the refine whose `scheme` is a value of `marc:relators`. From 05739eec31ad4bd637e4da17235c777fdbdbb86c Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Wed, 25 Mar 2020 08:14:05 +0100 Subject: [PATCH 13/17] Remove the paragraph about role overriding --- streamer/parser/metadata.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index a20dfa8..32d1da9 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -128,8 +128,6 @@ The following mapping should be used to determine the key of the contributor’s Where `opf:role` is the value of the attribute of the ``. -In case of conflict, `opf:role` overrides the XML element used. For example, map a `` or `` element whose value for `opf:role` is `aut` to an author. - Parse the carrying element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. Finally, the string used to sort the name of the contributor is the value of the `opf:file-as` attribute of this element. @@ -155,8 +153,6 @@ The following mapping should be used to determine to key of the contributor’s Where `role` is the value of the refine whose `scheme` is a value of `marc:relators`. -In case of conflict, `role` overrides the XML element used. So, for example, map a `` or `` element whose value for `role` is `aut` to an author. - Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. Finally, the string used to sort the name of the contributor is the value of a refine with a `file-as` property. From e6045cacfbcafaa2844a10eff7fcedb9886e8b6d Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Thu, 26 Mar 2020 18:18:20 +0100 Subject: [PATCH 14/17] Change the order in the contributor table and make explicit cases when role is not applicable --- streamer/parser/metadata.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 32d1da9..75d1bc9 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -115,9 +115,8 @@ The following mapping should be used to determine the key of the contributor’s | element | opf:role | key | |------------------------------|--------------------------|-------------| | dc:creator | \ or \ | author | -| dc:creator or dc:contributor | aut | author | | dc:contributor | \ or \ | contributor | -| dc:publisher | \ | publisher | +| dc:creator or dc:contributor | aut | author | | dc:creator or dc:contributor | pbl | publisher | | dc:creator or dc:contributor | trl | translator | | dc:creator or dc:contributor | edt | editor | @@ -125,6 +124,7 @@ The following mapping should be used to determine the key of the contributor’s | dc:creator or dc:contributor | art | artist | | dc:creator or dc:contributor | clr | colorist | | dc:creator or dc:contributor | nrt | narrator | +| dc:publisher | N/A | publisher | Where `opf:role` is the value of the attribute of the ``. @@ -139,9 +139,8 @@ The following mapping should be used to determine to key of the contributor’s | element | role | key | |------------------------------|--------------------------|-------------| | dc:creator | \ or \ | author | -| dc:creator or dc:contributor | aut | author | | dc:contributor | \ or \ | contributor | -| dc:publisher | \ | publisher | +| dc:creator or dc:contributor | aut | author | | dc:creator or dc:contributor | pbl | publisher | | dc:creator or dc:contributor | trl | translator | | dc:creator or dc:contributor | edt | editor | @@ -149,7 +148,8 @@ The following mapping should be used to determine to key of the contributor’s | dc:creator or dc:contributor | art | artist | | dc:creator or dc:contributor | clr | colorist | | dc:creator or dc:contributor | nrt | narrator | -| media:narrator | \ | narrator | +| dc:publisher | N/A | publisher | +| media:narrator | N/A | narrator | Where `role` is the value of the refine whose `scheme` is a value of `marc:relators`. From 860047477c0d3dd866fe20eba6d41de25d6f4489 Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Fri, 1 May 2020 09:58:13 +0200 Subject: [PATCH 15/17] Remove fallback to dc:language for metadata language --- streamer/parser/metadata.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 75d1bc9..185378f 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -12,9 +12,10 @@ In many cases, the default context supports alternate representations of the sam To fill such a map from an EPUB metadata element, proceed as follows: * Determine the language used in the content of the carrying element as defined in [the XML specification](https://www.w3.org/TR/xml/#sec-lang-tag), - i.e. check whether the carrying element has or inherits an `xml:lang` attribute. Otherwise, fallback to the primary language of the publication. + i.e. check whether the carrying element has or inherits an `xml:lang` attribute. * In the EPUB 3.x case, check if the element is refined by some `meta` elements that have or inherit an `xml:lang` attribute and whose property is `alternate-script`. For each one, add to the map the corresponding language associated with the content of the `meta` element. +* When no language hint is available, use `null` or `und` depending on the platform. ## Title From e9ba51589e6dc46c9c868375759c5bf46c0505a0 Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Fri, 1 May 2020 11:12:30 +0200 Subject: [PATCH 16/17] Update file-as parsing so that sortAs is a localized string --- streamer/parser/metadata.md | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 185378f..0e53861 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -17,18 +17,20 @@ To fill such a map from an EPUB metadata element, proceed as follows: For each one, add to the map the corresponding language associated with the content of the `meta` element. * When no language hint is available, use `null` or `und` depending on the platform. +## Sorting keys + +Localized sorting keys are supported in RWPM for publication title, contributor/collection' names and subject' names. While computing the localized string, use the language of the carrying element as defined in [the XML specification](https://www.w3.org/TR/xml/#sec-lang-tag) and fallback to `null` or `und`. -## Title -The `title` of a publication is an object where each key is a BCP 47 language tag and each value of this key is a string. +## Title -In addition to `title`, a publication may also contain a `sortAs` string, used to sort the title as well. +The `title` and `sortAs` keys of a publication are objects where each key is a BCP 47 language tag and each value of this key is a string. When parsing an EPUB, we need to establish: * which title is the primary one -* the string used to sort the title of the publication * a language map of the representations of the title +* a language map of strings used to sort the title of the publication * which title is the subtitle * a language map of the representations of the subtitle @@ -38,7 +40,7 @@ The first `` element should be considered the primary one. Parse it as a [localized string](#localized-strings) to compute a language map. -The string for `sortAs` is the value of `content` in a `meta` whose `name` is `calibre:title_sort` and `content` is the value to use. +The value of sorting key of the publication is given by the `content` attribute in a `meta` whose `name` is `calibre:title_sort`. The subtitle can’t be expressed. @@ -51,7 +53,7 @@ The primary `title` is defined using the following logic: Parse it as a [localized string](#localized-strings) to compute a language map. -The string used to sort the `title` of the publication is the value of the main title’s refine whose `property` is `file-as`. +The sorting key of the publication is carried by the main title’s refine whose `property` is `file-as`. The subtitle is the value of the `` element whose `title-type` (refine) is `subtitle`. In case there are several, use the one with the lowest `display-seq` (refine). Parse it as a [localized string](#localized-strings) to compute a language map. @@ -99,15 +101,15 @@ The valid URI is the result of this second step e.g. `urn:isbn:123456789X`. The contributor’s key depend on the role of the creator or contributor. It is an object that contains a `name`, a `sortAs` and an `identifier` key. -The `name` of each `contributor` is an object where each key is a BCP 47 language tag and each value of the key is a string. +The `name` and `sortAs` keys of each `contributor` are objects where each key is a BCP 47 language tag and each value of the key is a string. -The contributor object may also contain a `sortAs` string, used to sort the contributor as well, and an `identifier` string that must be a valid URI. +The contributor object may also contain an `identifier` string that must be a valid URI. When parsing an EPUB, we need to establish: * the key of the contributor; * a language map for the name of this contributor; -* the string used to sort the name of the contributor. +* a language map used to sort the name of the contributor. ### EPUB 2.x @@ -131,11 +133,11 @@ Where `opf:role` is the value of the attribute of the ``. Parse the carrying element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. -Finally, the string used to sort the name of the contributor is the value of the `opf:file-as` attribute of this element. +Finally, the string used to sort the name of the contributor is provided by the value of the `opf:file-as` attribute of this element. ### EPUB 3.x -The following mapping should be used to determine to key of the contributor’s object: +The following mapping should be used to determine to key of the contributor’s object: | element | role | key | |------------------------------|--------------------------|-------------| @@ -156,7 +158,7 @@ Where `role` is the value of the refine whose `scheme` is a value of `marc:relat Parse the `contributor` element as a [localized string](#localized-strings) to compute a language map for the contributor’s name. -Finally, the string used to sort the name of the contributor is the value of a refine with a `file-as` property. +Finally, the string used to sort the name of the contributor is carried by the contributor's refine whose property is `file-as`. ## Language From 776ea1dbd8ad4bb85adaf54773df40b03feab640 Mon Sep 17 00:00:00 2001 From: Quentin Gliosca <32197639+qnga@users.noreply.github.com> Date: Fri, 1 May 2020 11:17:41 +0200 Subject: [PATCH 17/17] Add fallbacks to Epub 2 case for series and title's sorting key --- streamer/parser/metadata.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/streamer/parser/metadata.md b/streamer/parser/metadata.md index 0e53861..4e72bf6 100644 --- a/streamer/parser/metadata.md +++ b/streamer/parser/metadata.md @@ -53,7 +53,7 @@ The primary `title` is defined using the following logic: Parse it as a [localized string](#localized-strings) to compute a language map. -The sorting key of the publication is carried by the main title’s refine whose `property` is `file-as`. +The sorting key of the publication is carried by the main title’s refine whose `property` is `file-as`. If there is none, fallback to the EPUB 2.x case. The subtitle is the value of the `` element whose `title-type` (refine) is `subtitle`. In case there are several, use the one with the lowest `display-seq` (refine). Parse it as a [localized string](#localized-strings) to compute a language map. @@ -271,6 +271,8 @@ The `identifier` string is the value of the refine whose `property` has the valu The `position` of the publication is the value of the refine whose `property` has the value of `group-position`. +If there is no `series`, try to parse `calibre:series` as in the EPUB 2.x case. + ## Progression Direction The `readingProgression` of a publication is a key whose value is a string amongst the following: