-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Annotations import/export]: support of the annotation selectors from w3c annotation spec #2625
Comments
This doesn't make sense. A CSSSelector references a DOM Element, and the |
Follow internal discussion with the team. I will try to explain how works the dom range serialisation in r2-navigator-js. Dom Range is the representation of a range of a start and end element. It could be 2 TEXT_NODE with the start and end offset at character level or it could be 2 ELEMENT_NODE with the start and end offset at the child index. Dom Range is serialise in r2-navigator-js to an object of 6 values
A css Selector cannot reference a TEXT_NODE like Daniel said, so we have to specify the index of the TEXT_NODE in function of the parent element. In that case the full possibility of a DOM Range is preserved and can be fully recreated. When an element is an ELEMENT_NODE : So this is the reason why we lost information with a cssSelector refined by a textPositionSelector, it need to reconstruct the structure of the DOM Element with only a start and end character without to know what TEXT_NODE index element is targeted. textPositionSelector has to travel the graph to extract every text length recursively up to obtain the position of the TEXT_NODE index wanted. XPath doesn't have this issue, since we can serialise any kind of element like TEXT_NODE with
I hope it more clear, at least for me. The question now is whether we should trust textPositionSelector ? |
An another question will be how to import these annotations selector that need to be converted to currently we can import a Readium annotation set format aka .annotation from both library and reader windows and will be processed in the main process. If the selector cannot be mapped to IRangeInfo "offline" (without DOM mounted), the selector will not be imported to publication annotation list saved in thorium database. So we need an adapter to import any selector and convert it to Dom Range info and then r2-navigator-js There are some constraints :
|
The use case to import annotation set in Thorium can be this :
Currently 1, 10, 11, 12 and even 13 is not implemented in develop branch
The most current priority will be the “convert Selector to Range” routine. |
selectors highlight demonstration : https://github.com/edrlab/w3c-annotation-selector-demo |
I propose a selector that can be mapped to IRangeInfo without DOM context : {
"type": "RangeSelector",
"startSelector": {
"type": "CssSelector",
"value": "#intro > p:nth-child(2)",
"refinedBy": {
"type": "TextNodeIndexSelector",
"value": 0,
"refinedBy": {
"type": "CodeUnitSelector",
"value": 4
}
}
},
"endSelector": {
"type": "CssSelector",
"value": "#intro > p:nth-child(3)",
"refinedBy": {
"type": "TextNodeIndexSelector",
"value": 2,
"refinedBy": {
"type": "CodeUnitSelector",
"value": 11
}
}
}
} RangeSelector with a CssSelector and 2 new selectors to find the textNodeIndex from a normalize range and the codeUnit character index position. can easily be mapped to IRangeInfo : {
"rangeInfo": {
"endContainerChildTextNodeIndex": 2,
"endContainerElementCssSelector": "#intro > p:nth-child(3)",
"endOffset": 11,
"startContainerChildTextNodeIndex": 0,
"startContainerElementCssSelector": "#intro > p:nth-child(2)",
"startOffset": 4
},
"cleanBefore": " Some text. The ",
"cleanText": "quick brown fox jumps over the lazy dog. The lazy white dog sleeps",
"cleanAfter": " with the crazy fox. Image wit",
"rawBefore": " Some text.\n The ",
"rawText": "quick brown fox jumps over the lazy dog.\n The lazy white dog sleeps",
"rawAfter": " with the crazy fox.\n Image wit"
} |
Currently we can export and import an annotations set with the readium Annotation spec but the annotation matching selector is locked with the r2-navigator-js IRangeInfo model. We need an interface to accept/parse any annotation selectors from the w3c annotation spec.
Support of the w3c annotation data model selectors https://www.w3.org/TR/2017/REC-annotation-model-20170223/#selectors :
Need to update the readium annotator spec https://github.com/readium/annotations?tab=readme-ov-file#111-selector to fully support w3c annotation selector model.
FragmentSelector :
TextFragment :
conformsTo
application/xhtml+xml
Fragment Identifier Spec and scroll to text fragment spec https://wicg.github.io/scroll-to-text-fragment/Not supported yet, need to found a strong library to handle this.
audiobook media flags:
CssSelector :
example :
refined by a textPositionSelector inside the node container
Supported on apache-annotator
xPathSelector
ex:
Not supported both in apache-annotator and r2-navigator-js
Need to think how to deal with this selector, and if it will be parsed.
Note: used with the hypothesis client https://github.com/hypothesis/client/blob/main/src/annotator/anchoring/xpath.ts
TextQuoteSelector
ex:
Supported on apache-annotator
Do not generate with LCP protection publication : Note from w3c spec :
Implementation with Apache-annotator :
https://annotator.apache.org/docs/api/modules/selector.html#textquoteselectormatcher
TextPositionSelector
ex:
apache annotator implementation: https://annotator.apache.org/docs/api/modules/selector.html#textpositionselectormatcher
RangeSelector
ex:
supported on apache-annotator
range to RangeSelector :
Just a POC example, need to test it !
rangeSelector is parsable without DOM content loaded in memory, with just a mapping to the r2-navigator-js IRangeInfo
rangeSelector matched implemented here with apache-annotator usable like other selector.
The text was updated successfully, but these errors were encountered: