Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace textDirection property with recommendation for including control characters in the text #336

Closed
aaronpk opened this issue Aug 4, 2016 · 6 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. invalid

Comments

@aaronpk
Copy link
Member

aaronpk commented Aug 4, 2016

The textDirection property is an unproven theoretical solution to setting the base text direction. My understanding is it was added to this spec before it had any real implementation experience.

Currently, there are solutions in Unicode string encoding itself that can accomplish setting the base text direction. Using the existing Unicode solution has the added benefit of being supported by many systems without them doing any extra work. There is a great article by the i18n group that covers a handful of control characters and describes how to use them to set the base text direction: https://www.w3.org/International/questions/qa-bidi-unicode-controls

My recommendation for the annotation spec is to drop the textDirection property and instead include a recommendation to use the appropriate control characters as necessary.

@azaroth42
Copy link
Collaborator

azaroth42 commented Aug 4, 2016

Merging from #335:

The properties are not only for embedded strings (which in JSON we can expect to be unicode) but for arbitrary resources with URIs. I have no idea how PDFs store text strings (for example) and how well implemented the control characters are in those strings, but I can point you to many instances of older or just badly implemented XML documents in a huge variety of encodings. As these resources can take the role of the body of the Annotation, the unicode proposal isn't sufficient to address the requirements.

For example:

{"id": "http://example.org/annos/1",
  "type": "Annotation",
  "motivation": "commenting",
  "body": {
    "id": "http://example.com/old/text/thing",
    "type": "Text",
    "format": "application/old-text-format",
    "textDirection": "rtl",
    "processingLanguage": "ar",
    "created": "1997-08-17"
  },
  "target": "http://example.net/thing-that-document-is-about"
}

Note that the properties are listed under External Web Resources [1], not under Embedded Textual Body [2] for just this reason.

[1] https://www.w3.org/TR/annotation-model/#external-web-resources
[2] https://www.w3.org/TR/annotation-model/#embedded-textual-body

@gsergiu
Copy link

gsergiu commented Aug 5, 2016

also merged from #335 ... #335 (comment)

I see it exactly the opposite.

One might need to know the text direction for correct representation of text embedded in the annotations (TextualBody), not for the correct respresentation of external resources.
The external resources must have included inside the "files/bitstreams" all information required for a correct representation. It is not the responsability of annotations to correct wrong html/pdf/xml.
(I might be a usecase for it ... but it is not included in the current version of the standard).

Probably some selectors would need this information, the "textDirection" might be relevant for the text position selection. In that case ... the selector must set the value inside selector and not inside teh target/body

@BigBlueHat
Copy link
Member

We can't assume that everything is in Unicode, and storing this information within the Annotation document does help with selection as @gsergiu points out. It should not go on the selector itself, as that would only effect the direction of selected text and say nothing about the original documents text direction.

Given that we can't assume Unicode for external resources, we should keep textDirection as proposed. Implementors are more than welcome to use Unicode control characters within Embedded Textual Bodies as @azaroth42 points out.

@gsergiu
Copy link

gsergiu commented Aug 5, 2016

@BigBlueHat why do you need the textDirection for external resources? What can you do with it except of proper text selection in selectors? (the annotation itself has nothing to do with the correct representation of external resources!)

@azaroth42
Copy link
Collaborator

Discussed on the telco of 2016-08-05. The resolution was that there is no new information that wasn't already discussed. The proposal to use unicode control characters does not address the established need to cover non-unicode content, however much we might like to simply require unicode everywhere, retroactively.

Whether the features are valuable is the subject of #335, and thus we're closing this issue as the concrete proposal does not cover established requirements. Thank you for the proposal and bearing with us through the process!

@tomerm
Copy link

tomerm commented Aug 9, 2016

@aaronpk On modern OS (i.e. Windows, iOS, Android) text direction is associated with text rendering. In the storage both text with LTR or RTL or Auto text direction is represented the same way. Thus relating to text direction is best when it is done during rendering phase.

Unicode control characters or UCC (RLE, LRE etc .... ) are very valid means for enforcing text direction at rendering time and https://www.w3.org/International/questions/qa-bidi-unicode-controls indeed includes a lot of good use cases / examples.

However, using UCC for turning text direction into storage level property is cumbersome. For example for several following reasons:

  1. Leveraging UCC this way is based on a hidden assumption that all rendering engines which display text are fully UBA complaint. In reality this is not so. Moreover some rendering engines (Adobe Reader) might have quite different approach for Bidi (Arabic / Hebrew) text rendering.
  2. Search / sort capabilities in ALL technology / toolkit which allow rendering / processing of Bidi text (i.e. web technologies, back end technologies etc.) should be altered to ignore UCC (when they are injected into text to convey text direction information) during search / sort / concatenation and similar text based operations.
  3. Editable contexts (mostly rich text editors ) use higher level protocols (i.e. HTML markup in web browsers) to convey text direction - very much similar to textDirection property discussed above. To support both higher level protocol and UCC we need to support extra mapping / conversion.
  4. Many of higher level protocols (aka GUI SDK / toolkits) allow manipulation of text direction on the API level (instead of using UCC on the text level). Just a couple of examples:
    a. textDir is supported by all widgets in Dojo Toolkit
    b. setTextDirection function from Android
    Supporting UCC for representing text direction in the storage will require modifications of all those technology / toolkits.

@plehegar plehegar added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. and removed i18n-review labels Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. invalid
Projects
None yet
Development

No branches or pull requests

6 participants