Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of clarity on how to encode N'Ko te-kerende #36

Open
r12a opened this issue May 16, 2023 · 2 comments
Open

Lack of clarity on how to encode N'Ko te-kerende #36

r12a opened this issue May 16, 2023 · 2 comments
Labels
doc:nkoo gap The first comment in this issue is read by the gap-analysis document. i:encoding Characters & encoding i:segmentation Grapheme/word segmentation & selection l:nqo N'Ko script & language p:ok s:nkoo x:nkoo

Comments

@r12a
Copy link
Contributor

r12a commented May 16, 2023

This issue is applicable to N'Ko.

Certain constructs in N'Ko text mean 'each and every ....', and they appear with dash on the baseline with spaces either side. For example:
Screenshot 2023-05-16 at 17 51 06

This is also used in other locations where we might use a dash in Latin text.

The question is what is the appropriate character for the te-kerende and other similar looking uses.

More:

The GAP

Research showed that users are using ߺ U+07FA NKO LAJANYALAN with spaces either side for this. However, that character's main stated role in the Unicode Standard is to act like the Arabic tatweel and extend the baseline while joining the characters either side.

This approach works in all browsers.

See an example page 3, col 2 (from the right), below the picture.

The Unicode Standard doesn't provide any advice on this topic. The original proposal included a request for a te-kerende character, but it was not adopted.

Action taken

The question was raised at a Unicode Script Ad Hoc meeting.

Outcomes

The Unicode Script Ad Hoc committee considered the matter and agreed that the te-kerende should be represented using <space><lajanyalan><space>.

Priority

This is already a de facto standard

@r12a r12a added s:nkoo gap The first comment in this issue is read by the gap-analysis document. p:advanced The gap-analysis priority is Advanced. i:segmentation Grapheme/word segmentation & selection doc:nkoo x:nkoo labels May 16, 2023
@r12a
Copy link
Contributor Author

r12a commented May 16, 2023

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include:
N'Ko

@jfkthame
Copy link

The linked document at http://cormand.huma-num.fr/maninkabiblio/periodiques/silabosoona5.pdf also shows an example of one of the reasons <space><lajanyalan><space> is an unsatisfactory representation: see the first (right-hand) column on page 6, lines 2-3, where there is an occurrence of "ߞߏ ߺ ߏ ߺ ߞߏ߫" with the line wrapped at the space before the second te-kerende.

It is clearly stated in https://www.unicode.org/L2/L2015/15338-n4706-nko-additions.pdf that "A line can break after but not before a TE-KERENDE", but because users are forced to add spaces around it (because lajanyalan has completely different joining/rendering behavior), this spurious line-break occurs.

The fact that this usage "is a de facto standard" does not, I think, indicate that it is a good or appropriate way to encode te-kerende; only that users have had to make do with the character repertoire on their keyboard. It's like users representing the copyright symbol with "(c)" because they don't know how to type "©".

Te-kerende should (in my opinion) have been encoded as a character in its own right, as was proposed in N4706, and could then have easily been made available on N'Ko keyboard layouts.

It would still be possible to rectify this, although unfortunately there will be a legacy corpus of documents using the <space><lajanyalan><space> hack. But the sooner a real te-kerende is encoded, the sooner usage can begin to migrate to the better representation.

@r12a r12a added the l:nqo N'Ko script & language label May 6, 2024
@r12a r12a added p:ok i:encoding Characters & encoding and removed p:advanced The gap-analysis priority is Advanced. labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc:nkoo gap The first comment in this issue is read by the gap-analysis document. i:encoding Characters & encoding i:segmentation Grapheme/word segmentation & selection l:nqo N'Ko script & language p:ok s:nkoo x:nkoo
Projects
None yet
Development

No branches or pull requests

2 participants