Skip to content
This repository has been archived by the owner on Mar 7, 2023. It is now read-only.

Some latest round of Khmer encoding–shaping discussions #27

Open
lianghai opened this issue Nov 3, 2020 · 23 comments
Open

Some latest round of Khmer encoding–shaping discussions #27

lianghai opened this issue Nov 3, 2020 · 23 comments
Labels

Comments

@lianghai
Copy link
Contributor

lianghai commented Nov 3, 2020

Prompted by @MakaraSok’s recent talk at the Unicode Conference, a small group of people, including Makara, @NorbertLindenberg, and me, have been trying to set up some latest discussions about the Khmer script’s encoding–shaping issues. The group has met on 6 Nov, 20 Nov, 4 Dec, and 18 Dec in 2020.

The next meeting is scheduled for Friday 8 January 2021:

Liang Hai is inviting you to a scheduled Zoom meeting.

Topic: Khmer clusters
Time: Fri 8 Jan 2021, 9:00 am Cambodia (UTC+7, see also time zone conversion and calendar file)
Recurring: The same time, every two weeks (Fri 22 Jan, Fri 5 Feb, …)

Join Zoom Meeting
https://zoom.us/j/93120577362

Meeting ID: 931 2057 7362
Find your local number for dialing in: https://zoom.us/u/acI3YWbZaU

@MakaraSok
Copy link

@mcdurdin
Copy link

mcdurdin commented Nov 6, 2020

Ref https://software.sil.org/downloads/r/mondulkiri/Mondulkiri-7.100-developer.zip (via Didi) Under 'documentation'

@mcdurdin
Copy link

mcdurdin commented Nov 6, 2020

Ref: Makara's paper on 'Spoof-Vulnerable Rendering in Khmer Unicode Implementations' https://www.sil.org/system/files/reapdata/15/34/62/153462537465381054623906304930919921193/Spoof_Vulnerable_Rendering_in_Khmer_Unic.pdf

@NorbertLindenberg
Copy link

Defining Khmer clusters

@lianghai
Copy link
Contributor Author

lianghai commented Nov 6, 2020

This is the full slide deck I briefly touched during the meeting, originally prepared for the Unicode Conference last month: An open knowledge base for Indic text shaping

@MakaraSok
Copy link

MakaraSok commented Nov 6, 2020

@lianghai
Copy link
Contributor Author

An example of how to graphically define the written units of Khmer, independent from how they’re encoded: https://docs.google.com/spreadsheets/d/1YS0OJfw4Fr6wVh-0oyUvZc4Pi-2Dond4pnqFHqTEEi4/edit?usp=sharing

@MakaraSok
Copy link

MakaraSok commented Nov 20, 2020

A list of words from the Khmer Official [Chuon Nath] Dictionary with Robat (U+17CC): http://dictionary.tovnah.com/reg-search?qu=%E1%9F%8C

  1. កក៌ដ 2. កប៌ូរ 3. កាណ៌ 4. កាប៌ាស
  2. ឋានសួគ៌ 6. តប​ធម៌ 7. តូយ៌តន្ត្រី 8. ទក្ខិណាព័ត៌
  3. ទិដ្ឋធម៌ 10. ទុគ៌ត 11. ទុគ៌ម 12. ទុជ៌ន
  4. ទុព៌ល 14. ទុយ៌ស 15. ទេយ្យធម៌ 16. ទេសធម៌
  5. ធម៌ 18. នាយក​ធម៌ 19. និយ្យានិក​ធម៌ 20. នីវណរ​ធម៌
  6. គភ៌ 22. បញ្ច​ពណ៌ 23. បរិបូណ៌ 24. បរិបូណ៌
  7. បាបធម៌ 26. បូណ៌មី 27. បូព៌ 28. បូព៌​ទិស
  8. បូព៌​និមិត្ត 30. បោក្ខរព័ស៌ 31. ពណ៌ 32. ពណ៌នា
  9. ពាណ៌នា 34. ពិពណ៌នា 35. ពោធិបក្ខិយធម៌ 36. ព័ត៌មាន
  10. ព័ត៌មាន​កាល 38. ព្យាធិ​ធម៌ 39. មទ្រីបាព៌ 40. មាគ៌
  11. មាគ៌ា 42. យុត្តិធម៌ 43. លម្អក់​ព័ណ៌ 44. លោមព័ណ៌
  12. លំអក់ព័ណ៌ 46. វណ៌ 47. វិបយ៌ាយ 48. វិបយ៌ាស
  13. វិបរិណាមធម៌ 50. វិសគ៌ៈ 51. វិសទព័ណ៌ 52. សកដមាគ៌ា
  14. សគ៌ៈ 54. សង្ខតធម៌ 55. សព៌ជ្ញ 56. សព៌ាង្គ
  15. សព៌េជ្ញ 58. សព៌េជ្ញ​សាស្ដា 59. សម្បូណ៌ 60. សម្បូណ៌
  16. សិទ្ធាថ៌ 62. សុជីវធម៌ 63. សុពណ៌ 64. សួគ៌
  17. សួគ៌ា 66. ស្លាធម៌ 67. ហៃមពណ៌ 68. អឃ៌
  18. អជ៌ុន 70. អថ៌ 71. អធម៌ 72. អនាយ៌
  19. អន្តរាយិកធម៌ 74. អយុត្តិធម៌ 75. អសង្ខតធម៌ 76. អាឃ៌
  20. អាថ៌ 78. អាថ៌កំបាំង 79. អាយ៌េន 80. ឧន្មាគ៌ា
  21. ឆកាមាវចរ​សួគ៌ 82. ជង្ឃមាគ៌ា 83. ជាតិធម៌

@lianghai
Copy link
Contributor Author

lianghai commented Nov 20, 2020

Meeting on 20 Nov 2020

Action items

  1. Review the names list and propose changes.

  2. @lianghai: Improve the “Khmer: graphical analysis” spreadsheet with suggestions incorporated.

  3. @lianghai: Share a draft of data files like the Mongolian ones: https://github.com/lianghai/mongolian/tree/utn/utn/data

  4. Investigate collation algorithms.

Links and files shared in Zoom chat

@MakaraSok
Copy link

MakaraSok commented Nov 20, 2020

Some of Problems concerning Fonts used in Writings.pdf by KEO Linet, Department of Khmerisation, Lexicography and Translation of the National Language Institute (NLI) of the Royal Academy of Cambodia (RAC).

@n8willis
Copy link

Hi everyone,

Unfortunately I wasn't able to be eyes-open for the Nov 20 meeting (will try harder for the next one!) although I really wanted to be there.

At the "meta-question" level, would it be possible for people who are posting downloadable resources (e.g., PDF slide decks) to also mention what the licensing is on those documents? If it's possible, of course. I know it might not always be so. Or, at least, to mention something about the source, if not a legaese-formal 'license' per se....

Certainly that's not a huge impediment to viewing anything at present, but I have a bad tendency to collect such resources locally and save them for future reference, and over time it has kind of become a problem when I can't recall what the origin & circumstances of a PDF are.

Don't mean this to be a burden on anyone; perhaps just consider it a plea for future help. Folks posting their own slide decks is pretty straightforward, but I'm less clear about some of the external links and material in .zip files.

@MakaraSok
Copy link

I'm less clear about some of the external links and material in .zip files

Please let us know the exact links so that we can probably add the metadata for you.

For the .zip files, see "Other Resources" at: https://software.sil.org/mondulkiri/.

@n8willis
Copy link

Well, "Some of Problems concerning Fonts used in Writings.pptx" certainly is one. I don't see a date on it, and I can't tell what organization the author is from (searching doesn't turn anything up).

lianghai added a commit to lianghai/unicode that referenced this issue Dec 3, 2020
@MakaraSok
Copy link

Khmer sorting order rules based on the existing method used in the Khmer-Khmer dictionary published in 1967: https://docs.google.com/document/d/1n64obcr8PyYX9Xgk371xk3i0euOTjgMDRKz2TjjOpN0/edit?usp=sharing

@lianghai
Copy link
Contributor Author

lianghai commented Dec 4, 2020

Additional files discussed today:

Documents to review before the next meeting (two weeks later):

  1. Section 7, Unicode Encoding, and section 8, Text Processing (8.2 and 8.3 are about sorting and font) of Makara’s draft specification.

  2. https://web.archive.org/web/20150105024205/http://www.panl10n.net/english/final%20reports/pdf%20files/Cambodia/CAM01.pdf

  3. Makara’s revised document on “Khmer sorting order rules based on the existing method used in the Khmer-Khmer dictionary published in 1967”.

@lianghai
Copy link
Contributor Author

lianghai commented Dec 4, 2020

Note that using Richard’s character pickers one can easily construct an arbitrary string, without being restricted by a keyboard layout: https://r12a.github.io/pickers/khmr/
cc @MakaraSok @iwsfutcmd

@MakaraSok
Copy link

This zip file contains research documents by Javier Sola of the then Open Forum of Cambodia:

  • About Khmer Script.pdf
  • Changes needed in Unicode 4.0 - v2.0.doc
  • Difficult to Display words in Khmer.odt
  • DisplayKhmerScript.pdf, and
  • Return_of_ZWSP_06_Sep_2008.pdf

Javier's.zip

@MakaraSok
Copy link

Here are some more from Javier. The zip file included here is given as is. Executable files have been excluded by the owner as they have been falsely flagged as virus.

UNICODE-20210120T060911Z-001.zip

@MakaraSok
Copy link

Slide 23 of this document issued by the MoEYS explains explicitly where the Consonant Shifters (aka Register Shifters) should go when typing:

PDF version: How_to_type_Khmer_Unicode.pdf

Source: http://krou.moeys.gov.kh/kh/article/item/download/595_aef67c4f54defb5c2d63718a0e120456.html

@MakaraSok
Copy link

The link below contains the translated version of the "How to type Khmer Unicode" above among other things related to Khmer Unicode. The file name is "How_to_type_Khmer_Unicode.ver1.1km.pdf".

https://www.mef.gov.kh/documents/fonts/khmer-unicode-for-mef.zip

Since this material is on the ministry website, it is "most likely" that they have been used/adopted by the ministry.

The highlight is that the Character Ordering is different from the Unicode Standard.

@NorbertLindenberg
Copy link

I can’t read Khmer, but it appears that How_to_type_Khmer_Unicode.ver1.1km.pdf differs in some ways, e.g. by adding a discussion of “Nonbreakable Space”, from the English 1.0 version:
https://web.archive.org/web/20180712194920/http://khmeros.info/download/KhmerUnicodeTyping.pdf

Is there an English 1.1 version?

@MakaraSok
Copy link

For our record, here is the link to the newly drafted Khmer Encoding Research: https://docs.google.com/document/d/18KlDJkea9k57zFQ52V6JFvVNOYm-y-4hJJLmqudmWrE/edit#.

The next group meeting will be discussed around this document.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants